Ex - Marquee 解説 by en_translator
We say that two strings \(X\) and \(Y\) of the same length “match” if and only if each pair of characters at the same position are equal except when at least one of them is _. We also use the term “match” for a pair of characters.
Let \(T\) be a string of length \((N+2W-2)\) that contains all possible state of the board. For example, for input \(1\), we can take \(T=\) ABC....ABC.. This problem asks “how many length-\(W\) substring of \(T\) matches \(P\)?”
If \(P\) contains a wild card _, it can be solved with KMP method etc.; however, it can hardly be applied for this problem, which involves _.
Instead, we consider an approach like ABC196F. The essential idea of ABC196F is to consider a function \(f(x,y):=xy+(1-x)(1-y)\) of two digits \(x\) and \(y\) that are both \(0\) or \(1\), which takes \(1\) if \(x=y\) and \(0\) if \(x\neq y\), and to obtain \(\sum_k f(T_{i+k},P_{k})\) as a convolution. This approach is valid for this problem too.
Consider a functions \(f(x,y)\) that takes \(0\) if characters \(x\) and \(y\) match, and takes a positive value otherwise. Formally, take a function \(f(x,y)\) such that:
- \(f(x,y)=0\) if at least one of \(x\) and \(y\) is
_; - \(f(x,y)=0\) if \(x=y\);
- \(f(x,y)>0\) otherwise.
Then, two strings \(X\) and \(Y\) match if and only if \(\sum_k f(X_k,Y_k)=0\).
Consider a function \(f\) for which we can find \(\sum_k f(T_{i+k},P_k)\) for all \(i\) fast.
Assign \(0\) to the wild card _ and distinct non-zero values to each of the other characters, then (if we identify each character with the assigned integer) \(f(x,y)=(x-y)^2xy\) satisfies the properties described above.
Then,
\(\sum_k f(T_{i+k},P_k)=\sum_k(T_{i+k}^3P_i-2T_{i+k}^2P_i^2+T_{i+k}P_i^3)\), so \(\sum_k f(T_{i+k},P_k)\) for all \(i\) can be found in a total of \(O((N+W)\log(N+W))\) time from convolutions of appropriate sequences.
Hence, the problem has been solved in a total of \(O((N+W)\log(N+W))\) time.
Side notes
If you compute FFT on \(\bmod M\) such as \(998244353\), it may lead to a false positive when \(\sum_k f(T_{i+k},P_k)\) is a multiple of \(M\). If the assignment of integers to the characters is sufficiently random, the probability of a false-positive match is \(O(1/M)\), so it finds a wrong answer with a probability of \(O((N+W)/M)\); thus, the probability of not being accepted is not tolerable.
In order to get AC (accepted) with a sufficiently high probability, you need to try multiple moduli or assignments of values to the characters.
This time, \(T\) does not contain a wild card, so we can take \((x-y)^2[y\neq \)_\(]\) as \(f_k(x,y)\) (where \([\mathrm{Prop}]\) evaluates to \(1\) if \(\mathrm{Prop}\) is true and \(0\) if false). Moreover, we can assign integers between \(0\) and \(53\) to the characters so that \(\sum_k f(T_{i+k},P_k)<998244353\); this way, calculating only \(\bmod 998244353\) always yields the correct answer.
writer’s solution (C++)
writer’s solution (Python)
Alternative solution
Take a sequence of random numbers \((R_i)\), and let \(Z_k\) be \(0\) if \(P_k=\) _ and \(R_k\) if \(P_k\neq \) _. Also, fix an arbitrary map \(c\) from characters to values. The substring of \(T\) starting from the \(i\)-th character matches \(P\) only if \(\sum_k c(T_{i+k})Z_k=\sum_k c(P_k)Z_k\). The left hand side can be found for all \(i\) in a total of \(O((N+W)\log(N+W))\) time; the right hand side is a constant independent of \(i\), which can be naively found in an \(O(W)\) time.
When we check the equality \(\bmod M\), the probability of a false positive is \(O(1/M)\); just as the solution described above, it will be accepted with a sufficiently high probability by trying multiple kinds of \((R_i)\).
投稿日時:
最終更新: