Official

E - XOR Matching Editorial by evima

Speeding up XOR convolutions

This editorial assumes the content of the following editorial, particularly [3]:

This problem is fundamentally intended to be solved by using multiple computation methods selectively, as described in the above editorial. However, it is also possible to obtain a correct answer by carefully applying constant-factor optimizations to a single approach. We introduce that here.


[1] Overview

Letting \(K\) be the number of distinct values in \(C\), the approach can be organized as follows:

  • Create \(K\) sequences of length \(2^M\). Call them \(S_1, S_2, \ldots, S_K\).
  • For each \(S_j\), add a suitable constant multiple of \(\mathrm{conv}(S_j,S_j)\) to \(\mathrm{ANS}\) (where \(\mathrm{conv}\) denotes XOR convolution).

The goal is to speed up this approach through algorithmic techniques.


[2] Number of Hadamard transforms in a naive implementation

\(\mathrm{conv(A,B)}\) can be computed in \(O(M2^M)\) time using the Hadamard transform. Specifically, the procedure is:

HA := hadamard_transform(A)
HB := hadamard_transform(B)
HC := pointwise product of HA and HB (HC[i]=HA[i]*HB[i])
conv_AB := hadamard_transform(HC) / 2^M

The computation involves three Hadamard transforms, and \(3K\) Hadamard transforms are needed in total.


[3] Saving Hadamard transforms (1)

What is needed in this problem is \(\mathrm{conv}(S,S)\), that is, the case \(A=B\). Therefore, by first eliminating the redundancy of repeatedly applying the Hadamard transform to the same sequence, the number of Hadamard transforms required for \(\mathrm{conv}(S,S)\) can be reduced to \(2\).

for S in [S1, S2, ..., SK]:
  HS := hadamard_transform(S)
  HT := pointwise square of HS
  ANS += (some coefficient) *  hadamard_transform(HT) / 2^M

In total, \(2K\) Hadamard transforms are needed.


[4] Saving Hadamard transforms (2)

Furthermore, it is inefficient that \(\mathrm{ANS}\) is being computed by adding up \(K\) Hadamard transforms, and these can be combined as follows:

for S in [S1, S2, ..., SK]:
  HS := hadamard_transform(S)
  HT := pointwise square of HS
  tmp += (some coefficient) * HT
ANS := hadamard_transform(tmp) / 2^M

With this approach, only \(K+1\) Hadamard transforms are needed. Compared to the naive method of iterating XOR convolution(A,B), a speedup of roughly three times is expected.

The time limit for this problem is set such that, with this kind of constant-factor optimization and an implementation close to the fastest on Library Checker, it falls near the boundary between AC and TLE in the execution environment at the time of the contest.


[5] Saving Hadamard transforms (3)

By performing multiple Hadamard transforms in parallel using a single integer type, a further speedup of several times can be expected. For example, by associating a triple of integers \((a,b,c)\) with absolute values less than \(2^{19}\) with a single integer \((a+2^{20}b+2^{40}c)\), the number of Hadamard transforms can be further reduced to roughly \(1/3\) (though careful implementation is needed to handle negative integers).

posted:
last update: