公式

G - 221 Substring 解説 by en_translator


Simplifying the problem

Let \((v_1, m_1), \cdots, (v_n, m_n)\) be the run-length encoding of \(S\) (where \((v_i, m_i)\) represents \(m_i\) copies of \(v_i\)).

For a (contiguous) subarray that is a 221-sequence, its run-length encoding \((v_l, v_l), \cdots, (v_r, v_r)\) satisfies:

  • \(v_l \leq m_l\), \(v_r \leq m_r\)
  • \(v_i = m_i\) (\(l < i < r\))

Let \(T\) be the sequence obtained by replacing each pair in the run-length encoding of \(S\) as follows:

  • if \(v_i > m_i\): into \((0)\)
  • if \(v_i = m_i\): into \((v_i)\)
  • if \(v_i < m_i\): into \((v_i, 0, v_i)\)

Then there is a one-to-one correspondence as a index sequence \((l, \cdots, r)\) between:

  • a subarray of \(T\) that does not contain \(0\): \((v_l, \cdots, v_r)\)
  • a subarray of \(S\) that is a 221-sequence: \((v_l, v_l), \cdots, (v_r, v_r)\)

Thus, the problem is simplified to counting distinct subarrays of \(T\) that do not contain \(0\).

Considering the reduced problem

Let \(T[l\ :\ r]\) denote the subarray formed by the \(l\)-th through \(r\)-th digits of \(T\). Let \(T[p_1\ :\ |T|] < \cdots < T[p_{|T|}\ :\ |T|]\) be the suffix array of \(T\), and \(\mathrm{lcp}[i]\) be the length of the longest common prefix of \(T[p_{i-1}\ :\ |T|]\) and \(T[p_{i}\ :\ |T|]\).

Consider the number of subarrays not containing \(0\), that are not a prefix of any \(T[p_i\ :\ |T|]\) (\(i < k\)), but is a prefix of \(T[p_k\ :\ |T|]\). If we denote by \(\mathrm{zerofree}[k]\) the number of consecutive occurrences of non-\(0\) digits starting from the \(p_k\)-th digit, the count is:

  • \(0\) if \(\mathrm{zerofree}[k] \leq \mathrm{lcp}[k]\);
  • \(\mathrm{zerofree}[k] - \mathrm{lcp}[k]\) if \(\mathrm{zerofree}[k] > \mathrm{lcp}[k]\).

The answer is the sum of the value above over \(k = 1, \cdots, |T|\).

Implementation strategy

As a famous fact, the suffix array of a string whose alphabet size is \(\sigma\) can be found in \(O(N + \sigma)\) time, and the lengths of the longest common prefixes in \(O(N)\) time.

Also, the number of consecutive occurrences of non-\(0\) digits starting from the \(p_k\)-th digit, \(\mathrm{zerofree}[k]\), can be found in a total of \(O(N)\) time by backtracking from the end of the string.

By implementing this, the problem can be solved fast. The time complexity is \(O(N + \sigma)\).

Sample code (C++)

#include <iostream>
using std::cin;
using std::cout;
#include <vector>
using std::vector;
using std::pair;
using std::make_pair;
using std::min;
using std::max;
#include <set>
using std::set;

#include <atcoder/string>
using atcoder::suffix_array;
using atcoder::lcp_array;

typedef long long int ll;

const int SIGMA = 9;

ll n;
vector<int> a;

struct Item {
	ll len;
	int val;
	ll idx;
};

// run-length list of (length, value)
vector<Item> runlength (const vector<int> &a) {
	vector<Item> lens;
	ll i = 0;
	while (i < (ll)a.size()) {
		ll j = i;
		while (j < (ll)a.size() && a[j] == a[i]) j++;

		lens.push_back({len: j-i, val: a[i], idx: i});

		i = j;
	}

	return lens;
}

// count substrings of s that doesn't contain 0 (ignoring position difference)
ll count_zerofrees (const vector<int> &s) {
	// a substring of s that doesn't contain 0 <-1to1-> original 221substr
	const ll m = s.size();
	const vector<int> sa = suffix_array(s, SIGMA);
	const vector<int> lcp = lcp_array(s, sa);

	// max_zerofree[i] := argmax_len [ s[i, i+len) doesn't contain 0 ]
	vector<ll> max_zerofree(m+1);
	max_zerofree[m] = 0;
	for (ll i = m-1; i >= 0; i--) {
		if (s[i] == 0) {
			max_zerofree[i] = 0;
		} else {
			max_zerofree[i] = max_zerofree[i+1] + 1;
		}
	}

	ll sum = 0;
	for (ll i = 0; i < m; i++) {
		ll l = ((i == 0) ? 0 : lcp[i-1]);
		ll r = max_zerofree[sa[i]];
		
		// l < len <= r
		sum += max(0LL, r - l + 0);
	}

	return sum;
}

void solve () {
	// run-length list of (length, value)
	const vector<Item> lens = runlength(a);

	// maximal [l, r) where [lens[l], lens[r]) satisfies either:
	// 1. lens[k].val == lens[k].len, or
	// 2. lens[k].val <  lens[k].len AND k is in {l, r-1}
	vector<pair<ll, ll> > maximals;
	{
		ll idxlen = 0;
		while (idxlen < (ll)lens.size()) {
			if (lens[idxlen].val > lens[idxlen].len) {
				idxlen++;
				continue;
			}

			ll jlen = idxlen + 1;
			ll jsee = jlen;
			while (jlen < (ll)lens.size()) {
				if (lens[jlen].val > lens[jlen].len) {
					jlen += 0;
					jsee += 0;
					break;
				} else if (lens[jlen].val < lens[jlen].len) {
					jlen += 0;
					jsee += 1;
					break;
				} else {
					jlen += 1;
					jsee += 1;
					continue;
				}
			}

			maximals.push_back({idxlen, jsee});
			idxlen = jlen;
		}
	}

	// [0] ++ {[lens[li].val, lens[ri].val)}.join([0]) ++ [0]
	vector<int> s = {0};
	for (const auto &[li, ri] : maximals) {
		for (int i = li; i < ri; i++) {
			s.push_back(lens[i].val);
		}
		s.push_back(0);
	}

	ll ans = count_zerofrees(s);
	cout << ans << "\n";	
}

int main (void) {
	std::cin.tie(nullptr);
	std::ios_base::sync_with_stdio(false);

	cin >> n;
	
	a.resize(n);
	for (ll i = 0; i < n; i++) {
		cin >> a[i];
		assert(a[i] <= SIGMA);
	}

	solve();

	return 0;
}

投稿日時:
最終更新: