Official

C - 1122 Substring 2 Editorial by en_translator


Consider performing an exhaustive search for the position of the boundary between the two characters in a 1122-string.

Suppose that the boundary is fixed at the position between the \(i\)-th and \((i+1)\)-th characters of \(S\).

There is at least one applicable substring for this fixed boundary if and only if \(S_i\) plus \(1\) equals \(S_{i+1}\). When this condition is satisfied, the number of applicable substrings is \(\min(c_1,c_2)\), where \(c_1\) and \(c_2\) are the lengths of the run of \(S_i\) and \(S_{i+1}\), respectively.

The answer to the original problem is the sum of this count over all \(i\).

For example, when \(S=\) 11223, the answer can be found as follows:

  • If \(i=1\): \(S_1=\) 1 and \(S_2=\) 1, so it does not contribute to the answer.
  • If \(i=2\): \(S_2=\) 1 and \(S_3=\) 2. There are \(2\) consecutive 1’s and \(2\) consecutive 2’s, so the contribution to the answer is \(\min(2,2)=2\).
  • If \(i=3\): \(S_3=\) 2 and \(S_4=\) 2, so it does not contribute to the answer.
  • If \(i=2\): \(S_4=\) 2 and \(S_5=\) 3. There are \(2\) consecutive 2’s and \(1\) consecutive 3, so the contribution to the answer is \(\min(2,1)=1\).
  • Therefore, the sought answer is \(2+1=3\).

Thus, the problem can be solved if we can find the length of the run containing each \(S_i\).

There are several ways to find lengths of runs. In this editorial, we will introduce two main approaches.


1. Run-length encoding

The run-length encoding of a string is a sequence of pairs of a value and the number of its consecutive occurrences. For example, the run-length encoding of \(S=\) 11223 is \((\)1\(,2),(\)2\(,2),(\)3\(,1)\). The lengths of runs can be, as the name suggests, obtained as the run-length encoding. The run-length encode can be performed in \(O(N)\) time, so the problem can be solved in a total of \(O(N)\) time.


2. Processing naively

Consider solving the problem by the following algorithm:

  • For \(i=1,2,\ldots,N-1\), if \(S_i\) plus \(1\) is \(S_{i+1}\), do the following:
    • Let \(j=i\). While \(S_j=S_i\), subtract \(1\) from \(j\).
    • Let \(k=i+1\). While \(S_k=S_{k+1}\), add \(1\) to \(i\).
    • The resulting \(\min(i-j,k+1-i)\) contributes to the answer.
  • The sum of the contributions obtained above is the answer to the problem.

It is easy to see that the algorithm above yields the answer. Let us now analyze the complexity of this algorithm.

The bottleneck of the algorithm above is decrementing \(j\) and incrementing \(k\). However, \(j\) does not take the same value for different \(i\), so the cost of updating \(i\) sums to \(O(N)\). The same goes for \(k\). Thus, the overall time complexity is \(O(N)\).


Either solution 1. or 2. allows us to solve the problem in \(O(N)\) time.

The problem can be solved by appropriately implementing the algorithm above.

Sample code (Python 3) (Solution 1)

from itertools import groupby

s = input()
a = [(v, len(list(x))) for v, x in groupby(s)]
ans = 0
for i in range(len(a) - 1):
    if int(a[i][0]) + 1 == int(a[i + 1][0]):
        ans += min(a[i][1], a[i + 1][1])
print(ans)

Sample code (Python3) (Solution 2)

s = input()
n = len(s)
ans = 0
for i in range(n - 1):
    if int(s[i]) + 1 != int(s[i + 1]):
        continue
    j = i
    while j != -1 and s[j] == s[i]:
        j -= 1
    k = i + 1
    while k != n and s[k] == s[i + 1]:
        k += 1
    ans += min(i - j, k - i - 1)
print(ans)

posted:
last update: