G - Balanced Subarrays 解説 by en_translator
Original proposer: vwxyz
First, let us consider how to compute the former value (containing \(B_k\) copies of each number).
We try all position for the right end of the subarray.
For each fixed right end, there exists a lower bound for the left end such that no same value occurs \(B_k + 1\) or more times.
These lower bounds can be found for all right end in a total of \(O(N)\) time, using the sliding window technique.
When the left end is not less than the lower bound, the condition is equivalent to that the number of occurrences modulo \(B_k\) is equal to \(0\) for all values.
Take a sufficiently large prime \(M\).
For each value, generate \(B_k\) random numbers that sum to a multiple of \(M\). For each occurrence of that value, from left to right, assign those random numbers to the position of occurrences, repeated every \(B_k\) elements. For each subarray, define its hash value as the sum of the assigned values, modulo \(B_k\). Then the numbers of occurrences of the values in a subarray are all equal modulo \(B_k\) if and only if its hash value is \(0\), with a sufficiently high probability.
Therefore, by taking the cumulative sums for the hash values and maintaining an associative array that maps the hash values to the indices, the sought count can be found in \(O(N)\) or \(O(N \log N)\) time.
Next, we consider how to compute the latter value (containing \(B_k\) distinct values).
We try all position for the right end of the subarray.
The left ends of the subarrays containing exactly \(B_k\) distinct values form a segment.
Take a sufficiently large prime \(M\). Assign a random value to each element as a hash, and define the hash value of a subarray (or a set) as the sum of the elements in it modulo \(M\).
Let \(S_i\) denote the sum of the hash values of the first \(i(0 \leq i \leq N)\) elements.
Also, when the right end is fixed, the set of integers that can be contained in a subsequence containing exactly \(B_k\) distinct integers is unique. Let \(H\) be the hash of that set.
Then, the subarray formed by the \((l+1)\)-th through \(r\)-th elements of \(A\) contains an equal number of occurrences for each value if and only if,
\(S_r-S_l \equiv H \times \frac{r-l}{B_k} \pmod{M}\),
with high probability.
This further transforms to
\(S_r \times B_k - H \times r=S_l \times B_k - H \times l\),
so the number of left ends can be found by maintaining the frequencies of \(S_i \times B_k - H \times i\) over all \(i\) within the segment.
When the right end is swept from left to right, if the set of \(B_k\) distinct elements remains the same, we keep the frequency table as it is; if it changes, we reconstruct it from scratch.
When the set changes, the new lower bound for the left end is greater than the old upper bound for the left end, so \(S_i \times B_k - H \times i\) is computed at most once for each \(i\).
By implementing appropriately, the total count can be found in \(O(N)\) or \(O(N \log N)\) time.
Bonus problem
Consider how to count the number of balanced sequences in \(O(N \sqrt N)\) time.
投稿日時:
最終更新: