Editorial - AtCoder Beginner Contest 387

Contest Duration: 2025-01-04 21:00:00+0900 - 2025-01-04 22:40:00+0900 (local time) (100 minutes) Back to Home

Official

G - Prime Circuit Editorial by en_translator

This problem features Kinoshita-Li composition, the most notable discovery in 2024.

Composition of formal power series (FPS)

FPS is a powerful tool in competitive programming. You can probably understand it from past problems of ABC, ARC, and AGC. FPS is a basis of theories, not only in competitive programming, but also in combinatorics mathematics and cryptography theory. This time, let us feature a basic operation that supports the applications of FPS.
There are mainly three ways to “merge” two FPSs: sum \(f(x) + g(x)\), product \(f(x) g(x)\), and composition \(g(f(x))\). It is generally known that the complexities are:

sum \(f(x) + g(x)\): \(\mathrm{O}(n)\) (where \(n\) is the degree)
product \(f(x) g(x)\) using FFT \(\mathrm{O}(n \log n)\)
composition \(g(f(x))\): ???

So what about the composition \(g(f(x))\)?
A textbook algorithm has been an \(\mathrm{O}((n \log n)^{3/2})\) algorithm discovered by Brent and Kung (paper). This algorithm is far slower than addition or multiplication, as the complexity suggests, so compositions were rarely used in competitive programming.
However, in 2024, Kinoshita (noshi91) and Li (Elegia) found an algorithm that achieves this function copmosition in \(\mathrm{O}(n \log^2 n)\) (paper). This algorithm was so revolutionary and practically fast that it spread widely in (part of) the competitive programming community. (We will call this algorithm Kinoshita-Li composition.)
We will explain how this Kinoshita-Li composition works.

Prerequisite: transposition principle

There are two prerequisites to understand Kinoshita-Li compositions. One is Bostan-Mori algorithm, and the other is the transposition principle. We already explainetd Bostan-Mori algorithm in the editorial of ABC300-Ex, so we will not explain it here. We will explain the transposition principle.

Regarding transposition principle, see also an article by Nachia (in Japanese).

Overview

The “transposition” in the transposition principle refers to the transposition of a matrix. The transposition of an \(N \times M\) matrix \(A\) refers to the \(M \times N\) matrix obtained by transferring the \((i, j)\) component to \((j, i)\), denoted by \(A^T\). For example,

\[ \left( \begin{array}{ccc} 1 & 2 & 3 \\ 4 & 5 & 6 \end{array} \right)^T = \left( \begin{array}{cc} 1 & 4 \\ 2 & 5 \\ 3 & 6 \end{array} \right). \]

(Not to be confused with the inverse matrix.)

The transposition principle is roughly as follows:

Transposition principle

There is a matrix \(A\) of size \(n \times m\).
Suppose that there is an algorithm that receives a length-\(m\) vector \(v\) and computes the length-\(n\) vector \(w = Av\) in complexity \(X\).
You want to obtain an algorithm that, given a length-\(n\) vector \(w\), computes the length-\(m\) vector \(v = A^T w\).
In fact, this can be computed in complexity \(X\), and the algorithm can be derived automatically. (!!)

Construction of algorithm

The construction of the algorithm to find \(A^T w\) using the transposition principle is as follows.
Suppose that the algorithm to compute \(A\) can be decomposed step-by-step as a matrix multiplication: \(A = A_1 A_2 \dots A_N\). Then, \(A^T = A_N^T \dots A_2^T A_1^T\). Thus, by deforming as:

decompose \(A\) into matrix product: \(A_1, A_2, \dots, A_N\),
reverse the order: \(A_N, \dots, A_2, A_1\), and
transpose the action: \(A_N^T, \dots, A_2, A_1\),

one can apply the algorithm corresponding to the matrix product after this deformation in this order to find \(A^T w\).

Decomposition of actions and transposition

Here, we decomposed \(A\) into \(A_1, A_2, \dots, \). What are these atomic operations of this decomposition?
There are three atomic operations:

swap \(x\) and \(y\),
\(x \gets m \times x\), and
\(x \gets x + m \times y\),

where \(x\) and \(y\) are variables (vector components), and \(m\) is a constant. If you have knowledge of linear algebra, every variable that appears in an algorithm is a linear combination of the input vectors if and only if the algorithm can be described only by the three atomic operations above. Thus, if the algorithm is a linear transformation, then it can be described by such operations. The transpositions of the operations are as follows. These can be probably understood by considering the matrices corresponding to the operations.

swap \(x\) and \(y\): remains invariant
\(x \gets m \times x\): remains invariant
\(x \gets x + m \times y\): becomes \(y \gets y + m \times x\)

However, when you actually apply the transposition principle, we usually apply the transposition in larger chunks (matrices). Here is a quotation from an atricle by maspy (translated by en_translator):

Deriving \(F^T\) from the algorithm of \(F\) is in principle possible by enumerating all the arithmetic operations of \(F\), but such a derivation is troublesome if the algorithm is complex. In practice, it is common to derive the algorithm by decomposing it into algorithms consisting of larger steps instead of arithmetic-operation-wise, and consider the transposition of each step (without applying the transposition principle).

Example: transposition of polynomial multiplication

Let \(A(x)\) be a polynomial of degree \(n\), and \(B(x)\) be a polynomial of degree \(n\). We will consider the transposition of the algorithm that finds \(C(x) = A(x) \times B(x)\) in \(\mathrm{O}(nm)\).
Here, if you regard both \(A\) and \(B\) as the variables (= the input vectors), we cannot represent the process as matrix actions, and cannot apply the transposition principle. Instead, we consider \(A\) as a variable (corresponding to the vector) and \(B\) as the input. That is, we consider the algorithm to find \(c \gets Ba\), where

\[a = \left( \begin{array}{c} A_0 \\ A_1 \\ A_2 \\ \vdots \\ A_n \end{array} \right), c = \left( \begin{array}{c} C_0 \\ C_1 \\ C_2 \\ \vdots \\ C_{n+m} \end{array} \right), \]

\[ B = \left( \begin{array}{cccc} B_0 & 0 & \dots & 0 & 0 \\ B_1 & B_0 & \dots& 0 & 0 \\ \vdots & & \ddots & & \vdots \\ 0 & 0 & \dots& B_m & B_{m-1} \\ 0 & 0 & \dots& 0 & B_m \\ \end{array} \right). \]

Before the transposition, the algorithm looks like this:

for(int k = 0; k <= n + m; k++) {
  for(int i = max(0, k - m); i <= min(k, n); i++) {
    c[k] += a[i] * b[k - i];
  }
}

By applying the algorithm naively, the transposition looks like this:

for(int k = n + m; k >= 0; k--) {
  for(int i = min(k, n); i >= max(0, k - m); i--) {
    a[i] += c[k] * b[k - i];  
  }  
}

We can assert that this indeed is an algorithm to compute \(a \gets B^T c\).

It is also known that the algorithm before the transposition, which finds \(c \gets B a\), can be done in \(\mathrm{O}((n+m) \log (n+m))\) time, using FFT (Fast Fourier Transform). By decomposing this algorithm using FFT into atomic operations and transposing them, we can find that the algorithm after the transposition, which finds \(a \gets B^T c\), can also be achieved in \(\mathrm{O}((n+m) \log (n+m))\) time. (Actually decomposing FFT into atomic operations is too bothersome, so when applying transposition, we regard FFT as a single block.)

As a general topic not directly related to the transposition principle, it is widely known that one can compute \(B^T c\) in \(\mathrm{O}((n+m) \log (n+m))\) time by reversing \(b\) and applying FFT. However, one can apply the transposition principle to obtain an algorithm with half the constant factor, which is called middle product.

The discussion so far may have been hard to understand (mainly in that it is unclear how transposition actually “works”), but understanding the transposition principle leads to a derivation of a nontrivial algorithm, which leads to the discovery of Kinoshita-Li composition.

The explanation of transposition principle is over. Now that we finished introducing the transposition principle, we will explain Kinoshita-Li composition.

Power Projection (transposition of Kinoshita-Li composition)

First, we will introduce a problem called Power Projection. (This algorithm is in fact the transposition of Kinoshita-Li composition.)

power projection

You are given polynomials \(f(x)\) and \(g(x)\) of degree \(n\). Let

\[h_i = \lbrack x^n \rbrack f(x)^i g(x).\]
Enumerate \(h_0, h_1, \dots, h_n\).

Power Projection can be solved in \(\mathrm{O}(n \log^2 n)\) time.
First, we consider the case where \(\lbrack x^0 \rbrack f(x) \neq 0\). Let \(\lbrack x^0 \rbrack f(x) = c\), then

\[\lbrack x^k \rbrack f(x)^i g(x) = \sum_{j = 0}^i \binom{i}{j} c^j \lbrack x^k \rbrack (f(x)-c)^i g(x),\]

so the answers can be found from the answers for \(f(x) \gets f(x)-c\) using convolution. Thus, we assume that \(\lbrack x^0 \rbrack f(x) = 0\).

If we represent the sought answer as a two-variable FPS, it is sufficient to find the following value up to the \(n\)-th degree polynomial of \(y\):

\[[x^n] \sum_{i=0}^\infty y^i f(x)^i g(x) = \lbrack x^n \rbrack \frac{g(x)}{1 - y f(x)}.\]

If we apply Bostan-Mori algorithm to this expression, the degree of \(y\) doubles every step, while the degree of \(x\) halves. In other words, in the \(t\)-th iteration, \(x\) has a degree of \(\lfloor n/2^t \rfloor\), and \(y\) has \(2^t\). Thus, the polynomial multiplication in each iteration costs \(\mathrm{O}(n \log n)\). Since \(\mathrm{O}(\log n)\) iterations are required, the overall complexity is \(\mathrm{O}(n \log^2 n)\).

The implementation of this algorithm is as follows.

We use the FPS library in Nyaan’s Library. pre(n) is the function that extracts the first \(n\) terms of a value of fps type. If you do not understand the other part, see also the library.
Note that we do not apply a constant-factor optimization for explanation, so this implementation is slow.

// Enumerates [x^n] f(x)^i g(x) for i=0,1,...,n
// For simplicty, assume [x^0] f = 0
template <typename mint>
FormalPowerSeries<mint> power_projection(FormalPowerSeries<mint> f,
                                         FormalPowerSeries<mint> g) {
  using fps = FormalPowerSeries<mint>;
  assert(f.size() == g.size());
  int n = f.size() - 1, k = 1;
  vector<fps> P(n + 1, fps(k + 1));
  vector<fps> Q(n + 1, fps(k + 1));
  Q[0][0] = 1;
  for (int i = 0; i <= n; i++) P[i][0] = g[i];
  for (int i = 0; i <= n; i++) Q[i][1] = -f[i];
  while (n) {
    auto R = Q;
    for (int i = 1; i <= n; i += 2) R[i] = -R[i];
    auto S = multiply2d(P, R);
    auto T = multiply2d(Q, R);
    vector<fps> U(n / 2 + 1, fps(k * 2 + 1));
    vector<fps> V(n / 2 + 1, fps(k * 2 + 1));
    for (int i = 0; i <= n / 2; i++) U[i] = S[i * 2 + n % 2], V[i] = T[i * 2];
    P = U, Q = V;
    n /= 2, k *= 2;
  }
  return (P[0] * Q[0].inv()).pre(f.size());
}

Kinoshita-Li composition

We will transpose Power Projection. Power Projection computes \((a_{i,j}) v\), where \(a_{i,j} = [x^j] f^i\) and \(v = \mathrm{rev}(g)\). What is the \(k\)-th component of the vector obtained by the transposed algorithm?

\[ \begin{aligned} \left((a_{i, j})^T w\right)_k &= \left((a_{j,i})w\right)_k \\ &= \sum_{i=0}^n [x^k] f^i w_i \\ &= [x^k] \sum_{i=0}^n w_i f^i \end{aligned} \]

If we let \(g(x) = \sum_{i=0}^n w_i x^i\), we arrive at

\[= [x^k] g(f(x)).\]

Thus, the transposition of Power Projection is a function composition!
Hence, by transposing the algorithm for Power Projection, we can obtain an \(\mathrm{O}(n \log^2 n)\) composition algorithm. The algorithm before the composition consists of convolution, two-dimensional convolution, and simple operations like assignments. The transposition of convolution is reverse + convolution as mentioned above, and that of two-dimensional convolution is also reverse + convolution. Thus, by using transposition, one can automatically derive the algorithm of composition.

The implementation of the algorithm described so far is the code below. It may not be easy to understand that this code is a transposition of Power Projection, but you can understand it by carefully following the variables.

We use the FPS library in Nyaan’s Library. pre(n) is the function that extracts the first \(n\) terms of a value of fps type. If you do not understand the other part, see also the library.
Note that we do not apply a constant-factor optimization for explanation, so this implementation is slow.

template <typename mint>
vector<FormalPowerSeries<mint>> inner(const FormalPowerSeries<mint>& g,
                                      vector<FormalPowerSeries<mint>> Q, int n,
                                      int k) {
  using fps = FormalPowerSeries<mint>;
  if (n == 0) {
    fps h = g * Q[0].inv().rev();
    vector<fps> P(n + 1, fps(k + 1));
    for (int i = 0; i < (int)g.size(); i++) P[0][i] = h[i + Q[0].size() - 1];
    return P;
  }
  auto R = Q;
  for (int i = 1; i <= n; i += 2) R[i] = -R[i];
  auto T = multiply2d(Q, R);
  vector<fps> V(n / 2 + 1, fps(k * 2 + 1));
  for (int i = 0; i <= n / 2; i++) V[i] = T[i * 2];
  auto U = inner(g, V, n / 2, k * 2);
  vector<fps> S(n * 2 + 1, fps(k * 2 + 1));
  for (int i = 0; i <= n / 2; i++) S[i * 2 + n % 2] = U[i];
  vector<fps> revR(n + 1, fps(k + 1));
  for (int i = 0; i <= n; i++) revR[i] = R[n - i].rev();
  vector<fps> Pshift = multiply2d(S, revR);
  vector<fps> P(n + 1, fps(k + 1));
  for (int i = 0; i <= n; i++) {
    for (int j = 0; j <= k; j++) P[i][j] = Pshift[i + n][j + k];
  }
  return P;
}

// g(f(x)) mod x^{n+1} を計算
// 簡単のため [x^0] f(x) = 0 を仮定
template <typename mint>
FormalPowerSeries<mint> composite(FormalPowerSeries<mint> f,
                                  FormalPowerSeries<mint> g) {
  using fps = FormalPowerSeries<mint>;
  assert(f.size() == g.size());
  int n = f.size() - 1, k = 1;
  vector<fps> Q(n + 1, fps(k + 1));
  Q[0][0] = 1;
  for (int i = 0; i < n + 1; i++) Q[i][1] = -f[i];
  auto P = inner(g, Q, n, k);
  fps h(n + 1);
  for (int i = 0; i <= n; i++) h[i] = P[i][0];
  return h.rev();
}

Kinoshita-Li inverse function

As a byproduct of Kinoshita-Li composition, there is also an algorithm that, given \(f(x) \bmod x^{n+1}\) \(([x^0]f(x) = 0, [x^1]f(x) \neq 0)\), finds its inverse \(f^{\langle -1 \rangle}(x) \bmod{x^{n+1}}\). This problem can be solved using this too, so we will explain it.

By Lagrange inversion theorem (for more details, please refer to editorial of ABC345G), we have

\[[x^n] f(x)^k = \frac{k}{n} [x^{n-k}] \left( \frac{x}{f^{\langle -1 \rangle}(x)}\right)^n.\]

Thus, by applying Power Projection to \(f(x)\), multiplying certain constants, and reversing them, we obtain \(\left( \frac{x}{f^{\langle -1 \rangle}(x)}\right)^n \bmod x^{n+1}\). All that left is to use \(\log\) and \(\exp\) of FPS, and we can obtain \(f^{\langle -1 \rangle}(x)\). (For more details, refer to implementation by Nyaan.)

Solution of the problem

Now we explain the solution of the original problem.
We first rephrase the problem statement in a form that is easier to count. The graph \(G\) satisfies the condition in the problem statement if and only if:

No cycles in \(G\) share a vertex with each other, and the length of every cycle is a prime number. \(\cdots (\ast)\)

(Proof) It is obvious that if () holds, then the condition in the problem statement holds, so it is sufficient to say that () holds when the condition in the problem statement holds. Here we claim the following lemma:

lemma: if the graph \(G\) contains cycles \(c_1\) and \(c_2\), and \(c_1\) and \(c_2\) shares a vertex, then \(G\) contains a circuit whose length is an even number.

If the lemma is shown, then the desired proposition can be shown by taking contraposition appropriately.
If the lengths of \(c_1\) and \(c_2\) are even, then it obviously holds, so we consider the case where they are odd.

(i) If \(c_1\) and \(c_2\) does not share an edge

In this case, there is a circuit that passes through all the edges of \(c_1\) and \(c_2\) once each, whose length is an even number, so the lemma holds.

(ii) If \(c_1\) and \(c_2\) shares an edge

Consider the graph consisting of the edges that is contained in exactly one of \(c_1\) and \(c_2\). The order of every vertex in this graph is even, so it consists of several cycles. If there is only one cycle, then the lemma holds because the length of the cycle is even. Now suppose that there are two or more cycles. Let \(c_3\) be the one with the shortest length.
If the length of \(c_3\) is even, then the lemma holds. We assume that the length of \(c_3\) is even. Without loss of generality, we may assume that \(|c_1| \geq |c_2|\). The graph consisting of the edges that is contained in exactly one of \(c_1\) and \(c_2\) has at most \(|c_1| + |c_2| - 2\) edges, so \(|c_3| \leq (|c_1| + |c_2| - 2) / 2\lt |c_1|\). Since \(c_2\) and \(c_3\) obviously shares an edge, we can pick \(c_2\) and \(c_3\) to boil down to a problem with a smaller size.

By the discussion above, we can assert that the lemma can be proved by induction with respect to the sum of the numbers of edges in \(c_1\) and \(c_2\). Thus, the proposition has been proved. (End of proof)

By the proof above, it is sufficient to solve the following problem.

rephrased problem

How many \(N\)-vertex simple connected graph \(G\) satisfies the following condition?

the cycles in \(G\) do not share a vertex with each other, and their lengths are all prime numbers.

Consider solving this problem using generating function.
As a typical approach for this kind of problem, let us consider the exponential generating function of the sequence of the number of ways to take a conforming graph, and also take a one vertex as a root. In other words, consider a generating function \(F(x)\) defined as

\[F(x) = \sum_{n=0}^\infty \frac{n a_n}{n!} x^n,\]

where \(a_N\) is the answer to the problem.

A similar kind of representation is obtained by labeling the vertices with \(0, 1, \dots, n-1\), and regard it as a rooted tree rooted at vertex \(0\).” In this representation, labels from \(1\) through \((n-1)\) can be freely attached to an \(n\)-vertex tree, so considering a generating function of a form \(\sum_n \frac{a_n}{(n-1)!} x^n\) often yields simple expressions. This expression coincides with the expression above after all, so we can do the same discussion.

Here, the generating function of the state where several children are attached to the root is \(x \exp(F(x))\). Also, if the vertex containing the root forms a size-\(p\) cycle, we can consider that they form a length-\(p\) sequence of \(x \exp(F(x))\) starting from the root, so it is \((x \exp(F(x)))^p\). Here, one cycle corresponds to two sequences, so we need to divide it by two. Therefore, using

\[G(x) = x + \sum_{p \geq 3, p \text{ is prime}} \frac{x^p}{2},\]

\(F(x)\) can be represented as

\[F(x) = G(x \exp F(x)).\]

The problem can be solved with Newton’s method. (For more details on Newton’s method, see the editorial of ABC345G.) That is, by writing

\[A(F) = G(x \exp(F(x))) - F(x) = 0,\]

we obtain

\[A'(F) = x \exp(F(x)) G'(x \exp F(x)) - 1,\]

and \(A(F)\) and \(A'(F)\) can be computed fast using Kinoshita-Li composition, so Newton’s method work.

Alternatively, we can even deform the expression to boil it down to Kinoshita-Li inverse function. By letting

\[H(x) = \exp G(x),\]

we can obtain

\[x \exp F(x) = x H(x \exp F(x)),\]

\[\frac{x \exp F(x)}{H(x \exp F(x))} = x,\]

So we can find \(x \exp F(x)\) up to the \((N+1)\)-th order using Kinoshita-Li inverse function, and then use \(\log\) of FPS to find \(\lbrack x^N \rbrack F(x)\).

Either way, the problem can be solved in a total of \(\mathrm{O}(N \log^2 N)\) time, which is fast enough.

posted:
last update: