H - Stroll Editorial by cai_lw


We present an \(O(N^3T\log T)\) solution, which may be faster or slower than the \(O(MT\log^2T)=O(N^2T\log^2T)\) official solution, depending on the constraints. The bottleneck of this algorithm is doing \(O(N^3)\) times of \(T\)th-degree polynomial multiplication, so an efficient implementation of FFT/convolution, such as the one in AtCoder Library (ACL), is essential. In practice, this solution seems slower than the official solution.

1. Matrices over a polynomial ring

Define \(f_{i,j}(x)=\sum_{t=1}^{T}p_{i,j,t}x^t\;(1\leq i,j \leq N)\) where \(p_{i,j,t}\) is the number of \(t\)-kilometer roads between point \(i\) and \(j\). All \(p_{i,j,t}\) can be directly read from input. Note that \(f_{i,j}(x)=0\) if \(i=j\) or there is no edge between point \(i\) and \(j\).

Define \(g^{(l)}_{i,j}(x)=\sum_{t=1}^{\infty}q_{i,j,t,k}x^t\;(1\leq i,j \leq N,1\leq l)\) where \(q_{i,j,t,l}\) is the number of courses from point \(i\) to \(j\), that contain exactly \(l\) roads, whose total length is exactly \(t\) kilometers. The answer to the original problem (“the answer”) can be expressed as \([x^T]\sum_{l=1}^{T}g^{(l)}_{1,1}(x)\). Since \(l\) roads are at least \(l\) kilometers long in total, we don’t need to consider any \(l>T\).

Consider the naive dynamic programming (DP) solution. To count courses with \(l\) roads between point \(i\) and \(j\), we extend courses with \(l-1\) edges from \(i\) and \(k\) by an edge from \(k\) to \(j\), for all \(1\leq k \leq N\). Formally, \(q_{i,j,t,l}=\sum_{k=1}^{N}\sum_{t'=1}^{t-1}q_{i,k,t',l-1}p_{k,j,t-t'}\).

By collecting such equations for all \(t\), we have \(g^{(1)}_{i,j}(x)=f_{i,j}(x)\) and \(g^{(l)}_{i,j}(x)=\sum_{k=1}^{N}g^{(l-1)}_{i,k}(x)f_{k,j}(x)\;(2\leq l)\). We notice that this formula has the same form as the definition of matrix multiplication.

Let \(A\) be a \(N\times N\) matrix where \(A_{i,j}=f_{i,j}(x)\). It immediately follows that \(A^l_{i,j}=g^{(l)}_{i,j}(x)\), where \(A^l\) denotes \(A\) raised to the \(l\)th power. The answer can then be expressed as \([x^T](A+A^2+\dots+A^T)_{1,1}\).

Moreover, since terms higher than \(x^T\) do not contribute to the answer, we can always discard all terms higher than \(x^T\) after polynomial multiplication. Thus, the complexity of polynomial multiplication is always \(O(T\log T)\).

In algebraic terms, we are now considering \(N\times N\) matrices over the polynomial ring \(\mathbf{F}_{998244353}[x]/x^{T+1}\). We will now simply call this ring \(\mathbf{R}\).

2. \(I+A+A^2+...=(I-A)^{-1}\)

\(A+A^2+\dots+A^T\) can be computed in \(O(\log T)\) matrix multiplications, in a similar fashion as exponentiation by squaring. Since a matrix multiplication needs \(O(N^3)\) element-wise multiplication, and an element-wise multiplication is an \(O(T\log T)\) polynomial multiplication., the overall complexity of this approach is \(O(N^3T\log^2 T)\). Under the constraints of this problem, \(N^3T\log^2 T\approx 9\times10^9\) operations cannot be done within 5 seconds.

Let \(I\) be the \(N\times N\) identity matrix, whose diagonal elements are the multiplicative identity of \(\mathbf{R}\), which is \(1\), and non-diagonal elements are the additive identity of \(\mathbf{R}\), which is \(0\). Specifically, \(I_{1,1}=1\) and \([x^T]I_{1,1}=0\). Thus, we can add \(I\) to the summation \(A+A^2+\dots+A^T\) without affecting the answer. In addition, \(A^{T+1}=0\) because \(T+1\) roads are at least \(T+1\) kilometers long in total, and there is no non-zero term for \(x^T\) or lower. Thus, we can also add \(A^l\;(l>T)\) to the summation without affecting the answer.

Due to the observations above, we can apply the well known equality \(I+A+A^2+...=(I-A)^{-1}\) and express the answer as \([x^T](I-A)^{-1}_{1,1}\), assuming that \(I-A\) is invertible. Since one can inverse a matrix in \(O(N^3)\) element-wise multiplications by Gaussian elimination, the complexity of this approach is \(O(N^3T\log T)\), with \(N^3T\log T\approx 6\times10^8\) operations fitting in the time limit of 5 seconds.

We will show when division (required by Gaussian elimination) is possible on \(\mathbf{R}\), and how to do it when possible, in Section 3, and show that \(I-A\) is always invertible in Section 4.

3. Inverse convolution, or polynomial division

Let \(p=a_0+a_1x+\dots+a_Tx^T\in\mathbf{R}\), we will find the convolutional reciprocal of \(p\), which is \(q=b_0+b_1x+\dots+b_Tx^T\in\mathbf{R}\) such that \(pq=1\). Note that \(a_i,b_i\;(0\leq i\leq T)\) are elements of the field \(\mathbf{F}_{998244353}\).

Consider the following naive algorithm for finding \(b_0,b_1,\dots,b_T\). Its correctness is obvious.

  • \(a_0b_0=1\implies b_0=1/a_0\)
  • \(a_0b_1+a_1b_0=0\implies b_1=-a_1b_0/a_0\)
  • \(a_0b_2+a_1b_1+a_2b_0=0\implies b_2=-(a_2b_0+a_1b_1)/a_0\)
  • \(\dots\)

We can see that \(q\) is uniquely determined by \(p\) if and only if \(a_0\neq 0\), and \(q\) does not exist if and only if \(a_0=0\). This is important for proving that \(I-A\) is always invertible in the next section.

This naive algorithm runs in \(O(T^2)\), but there are \(O(T\log T)\) algorithms based on fast polynomial division. For example, see https://stackoverflow.com/questions/44770632/fft-division-for-fast-polynomial-division. An implementation using atcoder::convolution is shown below.

vector<mint> conv_recip(const vector<mint> &a){
    int n=a.size(),n_low=(n+1)/2;
    if(n==1)return {1/a[0]};
    vector<mint> a_low(a.begin(),a.begin()+n_low);
    vector<mint> b_low=conv_recip(a_low);
    vector<mint> b=convolution(convolution(b_low,b_low),a);
    b.resize(n);
    for(int i=n_low;i<n;i++)
        b[i]=-b[i];
    return b;
}

4. \(I-A\) is always invertible

For any matrix \(M\in\mathbf{R}^{N\times N}\), let \(C(M)\in\mathbf{F}_{998244353}^{N\times N}\) such that \(C(M)_{i,j}=[x^0]M_{i,j}\). In other words, \(C(M)\) consists of constant terms of corresponding elements in \(M\). Since \(C(a)+C(b)=C(a+b)\), \(C(a)C(b)=C(ab)\) and \(C(a/b)=C(a)/C(b)\;(C(b)\neq 0)\) holds for any elements \(a,b\in\mathbf{R}\), the same can be said about matrices \(M_1,M_2\in\mathbf{R}^{N\times N}\).

When doing Gaussian elimination on \(M\), we replicate the exact same steps onto \(C(M)\). By the observations above, this is equivalent to doing Gaussian elimination on \(C(M)\), as Gaussian elimination only involves matrix addition and multiplication, as well as single element division. If we never fail to find an invertible (on \(\mathbf{R}\)) pivot in \(M\), then \(M\) is invertible, because we can turn \(M\) into identity matrix \(I\) by the end of the algorithm. By the conclusion in Section 3, this is true if and only if we never fail to find a non-zero pivot when doing Gaussian elimination on \(C(M)\). In other words, \(M\) is invertible if and only if \(C(M)\) is invertible.

For \(M=I-A\), since all elements in \(A\) has no constant term (Section 1), \(C(M)=I\) is invertible, hence \(I-A\) is invertible. Moreover, when doing Gaussian elimination on \(I\), it remains \(I\) throughout the algorithm. Thus, at any moment, all diagonal elements in \(M\) are invertible, and we can always use a diagonal element as the pivot.

5. Constant factor optimization

Finally, we summarize our proposed algorithm as follows:

  • Construct matrix \(A\in\mathbf{R}^{N\times N}\) where \(A_{i,j}=f_{i,j}(x)\) from the input.
  • Compute \(I-A\), where \(I\) is the identity matrix over \(\mathbf{R}\).
  • Invert \(I-A\) by Gaussian elimination. In the \(i\)th step, choose the \((i,i)\)-th element as the pivot.
  • Output \([x^T](I-A)^{-1}_{1,1}\).

Simply copy-pasting a matrix inversion algorithm may result in TLE depending on the implementation. However, there is no need to compute the full inverse matrix, as we are only interested in one of its element. With a simple optimization, we can speed up the algorithm by several times.

Let \(b=[1,0,\dots,0]^\top\in\mathbf{R}^N\). The answer can be expressed as \([x^T]((I-A)^{-1}b)_1\). Finding \(c=(I-A)^{-1}b\) is the same as solving the linear system \((I-A)c=b\). Since we are only interested in \(c_1\), we only need to turn \(I-A\) into a lower triangular matrix \(L\), then \(c_1=b_1/L_{1,1}\) follows. This only needs \(N^3/3+o(N^3)\) element-wise multiplications, 3 times as fast as \(N^3+o(N^3)\) element-wise multiplications for full matrix inverse.

Sample implementation

2254ms, with comments, no excessive copy-pasted templates

https://atcoder.jp/contests/abc213/submissions/24906179

posted:
last update: