Official

Ex - Sequence of Substrings Editorial by en_translator


Let \(S[l, r]\) the substring from \(l\)-th through \(r\)-th characters, \(s_l s_{l+1} \ldots s_r\).

This problem can be intuitively interpreted as follows.

You are asked to extract some consecutive substrings (hereinafter we simply call substrings). The segments from which the substrings are extracted should not overlap. Moreover, the sequence of the substrings in the order of original positions in \(S\),

\[ S[L_1, R_1], S[L_2, R_2], \ldots, S[L_K, R_K] \]

should be strictly monotonically increasing. At most how many substrings can be extracted from \(S\)?

We consider solving this problem with the following algorithm, which is similar to that of finding the Longest Increasing Subsequence (LIS) with Dynamic Programming (DP):

  1. First, let \((l_1, r_1), (l_2, r_2), \ldots, (l_M, r_M)\) be the sequence of pairs of integers \((l, r)\) satisfying \(1 \leq l \leq r \leq N\) in the lexicographical order of \(S[l, r]\) (with the ties broken by the decreasing order of \(l\)).
  2. Prepare a one-dimensional array \(\mathrm{dp}[\ast]\) and initialize as \(\mathrm{dp}[0] \leftarrow 0\) and \(\mathrm{dp}[i] \leftarrow -\infty\) for \(i = 1, 2, \ldots, N\).
  3. For each \(i = 1, 2, \ldots, M\), update as follows.
    • \(\mathrm{dp}[r_i] \leftarrow \max\lbrace \mathrm{dp}[r_i], \max\lbrace \mathrm{dp}[0], \mathrm{dp}[1], \mathrm{dp}[2], \ldots, \mathrm{dp}[l_i-1] \rbrace + 1 \rbrace\)
  4. The answer is obtained as \(\max \lbrace \mathrm{dp}[1], \mathrm{dp}[2], \ldots, \mathrm{dp}[N] \rbrace\).

When performing Step 1., for example, construct a Trie containing all the suffices \(S[1, N], S[2, N] , \ldots, S[N, N]\) and index each node in the order of pre-order traversal in Depth-First Search (DFS), so that the indices preserving the lexicographical order is obtained. This way, Step 1. can be achieved in a total of \(\mathrm{O}(N^2)\) time (or \(\mathrm{O}(N^2\log N)\) time).
Also, by managing the array \(\mathrm{dp}[\ast]\) with a segment tree, Step 2. through Step 4. can be achieved in a total of \(\mathrm{O}(N^2\log N)\) time.

Therefore, we can implement so that the algorithm above works in a total of \(\mathrm{O}(N^2\log N)\) time. However, it is difficult to fit this solution in the execution time limit, so we will consider optimizing this algorithm.

Here, let \(B\) be the maximum integer \(b\) satisfying \(1+2+3+\cdots+b \leq N\). In fact, we can impose an additional constraint that “the maximum length of substring is \(B\)” without changing the answer for the problem. This is because, for an arbitrary optimal solution \(\big((L_1, R_1), (L_2, R_2), \ldots, (L_K, R_K)\big)\), we can apply the following operation against this optimal solution so that for all \(i = 1, 2, \ldots, K\), it holds that \(R_i-L_i+1 \leq B\).

  • Repeat the following operation as long as there exists \(1 \leq i \leq K\) such that \(R_i-L_i+1 \gt B\):
    • For \(1 \leq i \leq K\) such that \(R_i-L_i+1 \gt B\), at least one of \(j = 1, 2, \ldots, i-1\) satisfies \((R_j-L_j+1)+1 \lt R_{j+1}-L_{j+1}+1\).
    • Choose one such \(j\) and replace \((L_{j+1}, R_{j+1})\) with \((L_{j+1}, R_{j+1}-1)\).

Therefore, the \(\mathrm{O}(N^2\log N)\) algorithm described above can be modified as follows, while the correct answer can still be obtained.

Instead of treating all pair of integers \((l, r)\) satisfying \(1 \leq l \leq r \leq N\), treat only the pairs \((l, r)\) satisfying \(r-l+1 \leq B\).

By this improvement, this problem can be solved in a total of \(\mathrm{O}(NB\log N) = \mathrm{O}(N\sqrt{N}\log N)\) time.

posted:
last update: