Editorial - AtCoder Beginner Contest 419

Contest Duration: 2025-08-16 21:00:00+0900 - 2025-08-16 22:40:00+0900 (local time) (100 minutes) Back to Home

Official

E - Subarray Sum Divisibility Editorial by en_translator

Every length-\(L\) subarray of \(A\) sums to a multiple of \(M\) if and only if the following are both satisfied:

\(A_1 + A_2 + \ldots + A_L\) is a multiple of \(M\).
\(A_i - A_{i + L}\) is a multiple of \(M\) for all \(i\) with \(1 \leq i \leq N - L\).

Thus, when we know the number of times we apply the operation for each \(A_1, A_2, \ldots, A_L\), then for \(i = L + 1, L + 2, \ldots, N\) the number of operations required is the minimum number of operations required to make \(A_{i - L}\) congruent to \(A_i\) modulo \(M\).

Let \(f_{i, j}\) be the minimum number of operations required to make all of \(A_i, A_{L + i}, A_{2L + i}, \ldots\) congruent to \(j\) modulo \(M\). This can be naively computed in \(O(NM)\) time.

We utilize the values \(f_{i, j}\) to perform DP (Dynamic Programming).

\(dp_{i,j}\) be the number of operations required to perform against \(A_1, A_2, \ldots, A_i\), so that the remainder of \((A_1 + A_2 + \ldots, A_i)\) divided by \(M\) equals \(j\). Here, “the number of operations required” is not only for \(A_1, A_2, \ldots, A_i\), but for all elements whose indices are congruent to any of \(1, 2, \ldots, i\) modulo \(L\). By the discussion above, fixing the number of operations against \(A_i\) uniquely determines the number of operations against \(A_{L + i}, A_{2L + i}, \ldots\), and that is why this definition is computable.

We describe the transitions of the DP in detail. If we determine that \(A_i\) will be made congruent to \(k\) modulo \(M\), then the number of operations required for \(A_i, A_{L + i}, A_{2L + i}, \ldots\) is \(f_{i, k}\) (by definition). Thus, we can fill the DP by \(dp_{i, ((j + k) \bmod M)} \leftarrow \min(dp_{i, ((j + k) \bmod M}), dp_{i - 1, j} + f_{i, k})\). The final answer is \(dp_{L, 0}\).

The DP can be processed in a total of \(O(LM^2)\) time, which is fast enough under an appropriate implementation.

Bonus: it can be solved in \(O(NM)\) time.

Sample code

#include <bits/stdc++.h>
using namespace std;
const int inf = 1000000001;
template<class T, class U> inline bool chmin(T& a, const U& b) { if (a > b) { a = b; return true; } return false; }

int main() {
	int n, m, l;
	cin >> n >> m >> l;
	vector<int> a(n);
	for (int i = 0; i < n; i++) cin >> a[i];
	vector<int> dp(m, inf);
	dp[0] = 0;
	for (int i = 0; i < l; i++) {
		vector<int> ep(m, inf);
		for (int j = 0; j < m; j++) {
			int cost = 0;
			for (int k = i; k < n; k += l) {
				if (j >= a[k]) cost += j - a[k];
				else cost += j - a[k] + m;
			}
			for (int k = 0; k < m; k++) chmin(ep[(k + j) % m], dp[k] + cost);
		}
		dp = move(ep);
	}
	cout << dp[0] << '\n';
}

posted:
last update: