Editorial - AtCoder Grand Contest 047

Contest Duration: 2020-08-09 21:00:00+0900 - 2020-08-09 23:20:00+0900 (local time) (140 minutes) Back to Home

Official

C - Product Modulo Editorial by Errichto

Hints

Hint 1

There is something in math that makes multiplication modulo $P$ less chaotic. (yes, this is quite a mysterious hint)

Hint 2

Use primitive root (generator) modulo $P$.

Hint 3

Change every number $x$ to its position in sequence $g^0, g^1, g^2, \ldots$

Hint 4

then FFT

Idea

There is a way to reorder numbers $1, 2, \ldots, P-1$ in such a way that we can quickly multiply two numbers and get their product without operators * or %.

Solution

$g = 2$ is a primitive root modulo $P$. The following equation changes the problem into convolution solvable with FFT:

\[g^i \cdot g^j = g^{i+j}\]

Compute the sequence $g^0, g^1, \ldots, g^{P-2}$ and save the mapping from value to position which is just $log_g(x)$.

While reading the input, for every non-zero $x$ do cnt[position[x]]++. Compute convolution $cnt \times cnt$ with FFT.

For every $k$, you computed the number of pairs of elements $(x, y)$ in the input such that $x = g^r$, $y = g^t$ and $k = r + t$ (so $x \cdot y = g^k$). This is the number of pairs of elements with exactly this product modulo $P$. Well, $g^k$ and $g^{k+(P-1)}$ are the same number so the answer (for ordered pairs) is:

\[\sum_{k=0}^{P-2} (g^k \bmod P) \cdot (fftResult[k] + fftResult[k+(P-1)]) \]

To count only unordered pairs $(i < j)$, subtract products $(a_i \cdot a_i \bmod P)$ and at the end divide the answer by $2$ (with modular inverse). Don’t forget about long longs.

code, 40 lines + FFT

#include  // Product Modulo, by Errichto
using namespace std;
#define REP(i,n) for(int i = 0; i < int(n); ++i)
typedef double ld; // 'long double' is 2.2 times slower
struct C { ld real, imag;
	C operator * (const C & he) const {
		return C{real * he.real - imag * he.imag,
				real * he.imag + imag * he.real};
	}
	void operator += (const C & he) { real += he.real; imag += he.imag; }
};
void dft(vector & a, bool rev) {
	const int n = a.size();
	for(int i = 1, k = 0; i < n; ++i) {
		for(int bit = n / 2; (k ^= bit) < bit; bit /= 2);;;
		if(i < k) swap(a[i], a[k]);
	}
	for(int len = 1, who = 0; len < n; len *= 2, ++who) {
		static vector t[30];
		vector & om = t[who];
		if(om.empty()) {
			om.resize(len);
			const ld ang = 2 * acosl(0) / len;
			REP(i, len) om[i] = i%2 || !who ?
					C{cos(i*ang), sin(i*ang)} : t[who-1][i/2];
		}
		for(int i = 0; i < n; i += 2 * len)
			REP(k, len) {
				 const C x = a[i+k], y = a[i+k+len]
						* C{om[k].real, om[k].imag * (rev ? -1 : 1)};
				a[i+k] += y;
				a[i+k+len] = C{x.real - y.real, x.imag - y.imag};
			}
	}
	if(rev) REP(i, n) a[i].real /= n;
}
templatevector multiply(const vector & a, const vector & b) {
	if(a.empty() || b.empty()) return {};
	int n = a.size() + b.size();
	vector ans(n - 1);
	/* if(min(a.size(),b.size()) < 190) { // BRUTE FORCE
		REP(i, a.size()) REP(j, b.size()) ans[i+j] += a[i]*b[j];
		return ans; } */
	while(n&(n-1)) ++n;
	auto speed = [&](const vector & w, int i, int k) {
		int j = i ? n - i : 0, r = k ? -1 : 1;
		return C{w[i].real + w[j].real * r, w[i].imag
				- w[j].imag * r} * (k ? C{0, -0.5} : C{0.5, 0});
	};
	vector in(n), done(n);
	REP(i, a.size()) in[i].real = a[i];
	REP(i, b.size()) in[i].imag = b[i];
	dft(in, false);
	REP(i, n) done[i] = speed(in, i, 0) * speed(in, i, 1);
	dft(done, true);
	REP(i, ans.size()) ans[i] = is_integral::value ?
			llround(done[i].real) : done[i].real;
	//REP(i,ans.size())err=max(err,abs(done[i].real-ans[i]));
	return ans;
}

const int P = 200003;

int main() {
	int g = 2;
	vector order{1};
	for(int x = g; x != 1; x = (long long) x * g % P) {
		order.push_back(x);
	}
	assert((int) order.size() == P - 1);
	vector where(P);
	for(int i = 0; i < (int) order.size(); ++i) {
		where[order[i]] = i;
	}
	vector cnt(P);
	int n;
	scanf("%d", &n);
	vector a(n);
	for(int& x : a) {
		scanf("%d", &x);
		if(x != 0) {
			cnt[where[x]]++;
		}
	}
	vector multiplier(P);
	vector answer = multiply(cnt, cnt);
	for(int i = 0; i < (int) answer.size(); ++i) {
		if(answer[i]) {
			int product = order[i%order.size()];
			multiplier[product] += answer[i];
		}
	}
	for(int x : a) {
		--multiplier[(long long) x * x % P];
	}
	long long total = 0;
	for(int i = 1; i < P; ++i) {
		assert(multiplier[i] % 2 == 0);
		total += multiplier[i] / 2 * i;
	}
	printf("%lld\n", total);
}

Faster modulo multiplication?

With fast memory access, precomputations with primitive root might speed up the calculation of $x \cdot y \bmod P$.

In C++, if you use a constant like const int P = 200003;, the modulo operations are anyway quite fast. But if you read $P$ from the input, replacing x * y % P with g_powers[pos[x]+pos[y]] is actually 2.5 times faster on author’s machine (5.3s and 2s for 1e9 multiplications, $P = 200\,003$). I don’t know how good this is compared to other known algorithms for speeding up operation %P.

For values of $P$ around $10^9$, this is viable only locally to compute something once or if it’s an optimization contest. You need 8GB and a few seconds for preprocessing.

posted:
last update: