E - XOR Matching 解説 by evima
Speeding up XOR convolutionsThis editorial assumes the content of the following editorial, particularly [3]:
This problem is fundamentally intended to be solved by using multiple computation methods selectively, as described in the above editorial. However, it is also possible to obtain a correct answer by carefully applying constant-factor optimizations to a single approach. We introduce that here.
[1] Overview
Letting \(K\) be the number of distinct values in \(C\), the approach can be organized as follows:
- Create \(K\) sequences of length \(2^M\). Call them \(S_1, S_2, \ldots, S_K\).
- For each \(S_j\), add a suitable constant multiple of \(\mathrm{conv}(S_j,S_j)\) to \(\mathrm{ANS}\) (where \(\mathrm{conv}\) denotes XOR convolution).
The goal is to speed up this approach through algorithmic techniques.
[2] Number of Hadamard transforms in a naive implementation
\(\mathrm{conv(A,B)}\) can be computed in \(O(M2^M)\) time using the Hadamard transform. Specifically, the procedure is:
HA := hadamard_transform(A)
HB := hadamard_transform(B)
HC := pointwise product of HA and HB (HC[i]=HA[i]*HB[i])
conv_AB := hadamard_transform(HC) / 2^M
The computation involves three Hadamard transforms, and \(3K\) Hadamard transforms are needed in total.
[3] Saving Hadamard transforms (1)
What is needed in this problem is \(\mathrm{conv}(S,S)\), that is, the case \(A=B\). Therefore, by first eliminating the redundancy of repeatedly applying the Hadamard transform to the same sequence, the number of Hadamard transforms required for \(\mathrm{conv}(S,S)\) can be reduced to \(2\).
for S in [S1, S2, ..., SK]:
HS := hadamard_transform(S)
HT := pointwise square of HS
ANS += (some coefficient) * hadamard_transform(HT) / 2^M
In total, \(2K\) Hadamard transforms are needed.
[4] Saving Hadamard transforms (2)
Furthermore, it is inefficient that \(\mathrm{ANS}\) is being computed by adding up \(K\) Hadamard transforms, and these can be combined as follows:
for S in [S1, S2, ..., SK]:
HS := hadamard_transform(S)
HT := pointwise square of HS
tmp += (some coefficient) * HT
ANS := hadamard_transform(tmp) / 2^M
With this approach, only \(K+1\) Hadamard transforms are needed. Compared to the naive method of iterating XOR convolution(A,B), a speedup of roughly three times is expected.
The time limit for this problem is set such that, with this kind of constant-factor optimization and an implementation close to the fastest on Library Checker, it falls near the boundary between AC and TLE in the execution environment at the time of the contest.
[5] Saving Hadamard transforms (3)
By performing multiple Hadamard transforms in parallel using a single integer type, a further speedup of several times can be expected. For example, by associating a triple of integers \((a,b,c)\) with absolute values less than \(2^{19}\) with a single integer \((a+2^{20}b+2^{40}c)\), the number of Hadamard transforms can be further reduced to roughly \(1/3\) (though careful implementation is needed to handle negative integers).
- Sample solution: https://atcoder.jp/contests/arc222/submissions/76586635
投稿日時:
最終更新: