Official

H - Stroll Editorial by en_translator


This problem is about what is called divide-and-conquer FFT (FFT=Fast Fourier Transform), which is a strong weapon for finding a value of a sequence.
It is one of the difficult application of FFT / convolution, but once you understand it, it can be implemented in about 20 lines, so if you are interested in combinatorics, we recommend you to study it.

Naive DP solution

First, let us consider a naive DP (Dynamic Programming). Let us define a DP table \(d_{s,t}\) as follows:

  • \(d_{s, t} :=\) the number of paths of length \(t\) kilometers, ending with Point \(s\).

What we want is \(d_{1, T}\).
Then, \(d_{s, t}\) can be computed if we know \(d_{\ast, u}\) for all \(u \lt t\), so we can fill the DP table in the increasing order of \(d_{\ast, u}\).
Strictly speaking, \(d_{s,t}\) can be computed as follows, using the variables in the problem statement:

\[d_{s,t} = \sum_{\substack{b_i = t \\ 1 \leq i \leq M }} \left( \sum_{0 \leq u \lt t} d_{a_i, u} \times p_{i, t - u} \right) + \sum_{\substack{a_i = t \\1 \leq i \leq M }} \left( \sum_{0 \leq u \lt t} d_{b_i, u} \times p_{i, t - u} \right) \]

The time complexity of the DP above is \(\mathrm{O}(M T^2)\) (which is explained later), but under the given constraints it costs \(MT^2 = 1.6 \times 10^{10}\), and the constant factor is bad, so it will receive TLE. Eliminating \(T^2\) part is the key point of the algorithm we explain here.

Why does it cost \(\mathrm{O}(T^2)\) time?

Before explaining the algorithm, we will explain why is intrinsically difficult to solve this problem in a quasilinear time of \(T\).

In this chapter and later, we will explain the case \(N = 2\) for simplicity; the cases \(N \geq 3\) is essentially the same, except that the number of edges \(O(M)\) is multiplied to the computational complexity.

Let \(p_u\) be the number of edges of length \(u\) connecting \(1 - 2\), then the DP table \(d_{1,t}, d_{2,t}\) for \(N = 2\) can be expressed as follows:

\[ d_{1,t} = \sum_{0 \leq u \lt t} d_{2,u} \times p_{t-u} \]

\[ d_{2,t} = \sum_{0 \leq u \lt t} d_{1,u} \times p_{t-u} \]

If you are familiar to the recent ABC, you may realize that “this expression is similar to that of convolution!” If \(d_{1,u}, d_{2,u}\) in the left hand sides of the expressions above are constants, we can compute them fast enough in a total of \(\mathrm{O}(T \log T)\) using the convolution with the aid of FFT.
However, if we try to apply FFT naively, we need the values of \(d_2\) to compute \(d_1\), and the values of \(d_1\) to compute \(d_2\), … so we are stuck.

Like this problem, some DP requires a DP table to fill another DP table. Such DP is called online DP.
As you can see the expression above, online DP depends on the values of other DP table, so it is difficult to optimize, and generally the only way to solve is to fill the DP table in the order of DAG (Directed Acyclic Graph) naively. Moreover, if we calculate the expression above naively, it will cost \(\mathrm{O}(T^2)\) time to fill the DP table.

Now, we have explained why this problem is cumbersome. It seems that we are facing a dead end, but actually, if online DP has some good properties, it is known that the computational complexity can be improved compare to the naive solution.
In this problem, we can use the property that the contributions can be represented as a convolution, in order to achieve the “convolution that is computable for online” with a combination of divide-and-conquer and convolution!

Divide-and-conquer FFT

Now let’s get down to business. In this chapter, we will explain the algorithm called “divide-and-conquer FFT.” (The algorithm is called in various ways, materials to materials, like cdq divide-and-conquer + FFT / online FFT / online convolution / relaxed multiplication, but in this document we will solely call “divide-and-conquer FFT.”)

First of all, we will visualize the transitions of DP of calculating the value of DP table for the segment \([l, r)\). Here, the leftmost circle correspond to the value of \(d_{\ast, l}\), and the rightmost one to that of \(d_{\ast, r-1}\).

We can see that the DP naively costs the time complexity of the square of the number of vertices, as the graph above is a directed version of an undirected complete graph.

The divide-and-conquer FFT is a step-by-step algorithm in which the DP with the transitions illustrated above is computed through the following three steps.

In the description below, let \(m = \left\lfloor \frac{l+r}{2} \right\rfloor\). Additionally, we guarantee the following invariant conditions.

  • By the time we compute the values for the segment \([l, r)\), the following conditions are already met:
    • The values for \([0, l)\) on the DP table is already determined.
    • The sum of contributions from \([0, l)\) through \([l, r)\) is already applied to the DP table \([l ,r)\).

1. Calculate the values for the segment \([l, m)\) recursively

First, we recursively compute the values for segment \([l, m)\) to determine the values of DP table in segment \([l, m)\). By the invariant conditions, the contributions from outside \([l, r)\) is already applied to the DP, so we can compute the values for \([l, m)\) without issues.
In the implementation, we call f(l, m) inside f(l, r).

2. Calculate the contributions from \([l, m)\) to \([m, r)\)

Next, we calculate the contributions from \([l, m)\) to \([m, r)\).
When \(N = 2\), the contributions corresponding to the description is expressed as follows. (\(\leftarrow\) means an assignment.)

\[ d_{1,t} \leftarrow d_{1,t} + \sum_{l \leq u \lt m} d_{2,u} \times p_{t-u} \]

\[ d_{2,t} \leftarrow d_{2,t} + \sum_{l \leq u \lt m} d_{1,u} \times p_{t-u} \]

Last time, we explained that the transition cannot be optimized with FFT because the values \(d_{1,u}, d_{2,u}\) is not determined.
However, this time, we have already computed \([l, m)\) by divide-and-conquer, so now we can apply the optimization of computing \(\Sigma\) in the expression above with convolution!
Therefore, this part can be computed in an \(\mathrm{O} (B \log B)\) time, where \(r - l = B\).

3. Calculate the values for the segment \([m, r)\) recursively

All the left is winning run. We finally compute the values for the segment \([m, r)\) recursively to determine the values of DP table in the segment \([m, r)\).

What about the complexity?

Now let us analyze the computational complexity of this algorithm. Let \(T(B)\) be the time complexity of using divide-and-conquer algorithm for the segment of width \(B\), then

\[T(B) = 2 T\left(\frac{B}{2}\right) + \mathrm{O} (B \log B).\]

A strict evaluation of complexity including recursive relation is difficult. We will explain an easier way: let

\[b = \log B, U(b) = \frac{T(2^b)}{2^b} = \frac{T(B)}{B}.\]

By dividing the both hand sides of original equation by \(B\), we obtain

\[ U(b)= U(b - 1) + \mathrm{O}(b) \]

so we can prove \(U(b) = \mathrm{O}(b^2) = \mathrm{O}(\log ^2 B)\). Therefore, we obtain

\[ T(B) = B \times U(b) = \mathrm{O}(B \log^2 B).\]

実装例

By applying the algorithm described above to the segment \([0, T+1)\), the problem has been solved in a total of \(\mathrm{O}(M T \log^2 T)\) time.
We will show the sample code below. (Like the sample code, using AtCoder Library will make implementation easier.)

#include <iostream>
#include <utility>
#include <vector>

#include "atcoder/convolution.hpp"
#include "atcoder/modint.hpp"
using namespace std;
using mint = atcoder::modint998244353;

vector<mint> d[11], p[11][11];
vector<pair<int, int>> es;

vector<mint> get_vec(const vector<mint>& v, int l, int r) {
  return {begin(v) + l, begin(v) + r};
}
void online_convolution(int l, int r) {
  if (l + 1 == r) return;
  int m = (l + r) / 2;
  online_convolution(l, m);
  for (auto& [a, b] : es) {
    auto da = get_vec(d[a], l, m);
    auto pab = get_vec(p[a][b], 0, r - l);
    auto ra = atcoder::convolution(da, pab);
    for (int i = m; i < r; i++) d[b][i] += ra[i - l];
    auto db = get_vec(d[b], l, m);
    auto rb = atcoder::convolution(db, pab);
    for (int i = m; i < r; i++) d[a][i] += rb[i - l];
  }
  online_convolution(m, r);
}

int main() {
  ios::sync_with_stdio(false);
  std::cin.tie(nullptr);
  int N, M, T, a, b, x;
  cin >> N >> M >> T;
  for (int i = 0; i < M; i++) {
    cin >> a >> b;
    --a, --b;
    es.emplace_back(a, b);
    p[a][b].resize(T + 1);
    for (int j = 1; j <= T; j++) cin >> x, p[a][b][j] = x;
  }
  for (int i = 0; i < N; i++) d[i].resize(T + 1);
  d[0][0] = 1;
  online_convolution(0, T + 1);
  cout << d[0][T].val() << "\n";
}

Related problems

We will show you some related problems that appeared in AtCoder in the past.

posted:
last update: