Official

F - Square Subsequence Editorial by en_translator


We denote by \(\sigma(i, c)\) the leftmost position of a lowercase English letter \(c\) occurring in the \(i\)-th and later characters of \(S\): \(S_{i+1} S_{i+2} \ldots S_N\). If the \(i\)-th and later characters do not contain \(c\), we let \(\sigma(i, c) \coloneqq \infty\).

This problem asks to find the number of (distinct) non-empty subsequences \(T\) of \(S\) that is represented as \(TT\) by a string \(T\).

As described in Sample Input 2, there can be multiple ways to extract a string from \(S\), so we need to avoid counting the same subsequence multiple times. To avoid it, we impose the following rule:

\(\spadesuit\)): extract as left characters of \(S\) as possible

In other words, when we extract a subsequence \(TT = T_1T_2\ldots T_nT_1T_2\ldots T_n\) from \(S\),

we first focus on the first \(T\) of \(TT\):

  • extract the \(p_1\)-th character of \(S\), where \(p_1 \coloneqq \sigma(0, T_1)\);
  • extract the \(p_2\)-th character of \(S\), where \(p_2 \coloneqq \sigma(p_1, T_2)\);
  • extract the \(p_3\)-th character of \(S\), where \(p_3 \coloneqq \sigma(p_2, T_3)\);
  • \(\cdots\)
  • extract the \(p_n\)-th character of \(S\), where \(p_n \coloneqq \sigma(p_{n-1}, T_n)\).

Then we focus on the second \(T\) of \(TT\):

  • extract the \(q_1\)-th character of \(S\), where \(q_1 \coloneqq \sigma(p_n, T_1)\);
  • extract the \(q_2\)-th character of \(S\), where \(q_2 \coloneqq \sigma(q_1, T_2)\);
  • extract the \(q_3\)-th character of \(S\), where \(q_3 \coloneqq \sigma(q_2, T_3)\);
  • \(\cdots\)
  • extract the \(q_n\)-th character of \(S\), where \(q_n \coloneqq \sigma(q_{n-1}, T_n)\);

This rule uniquely determines the way to extract a subsequence for every substring of \(S\), so each subsequence \(T\) can be counted without duplicates. Specifically, we try to count the following number for each fixed integer \(x\):

(\(\star\)): the number of subsequences \(TT\) such that \(q_1 = x\) when extracting it from \(S\) along the rule (\(\spadesuit\)).

By taking the sum of (\(\star\)) over all possible positions \(x\) that can be \(x\).

For a fixed \(q_1 = x\), we have \(T_1 \coloneqq S_x\) and \(p_1 \coloneqq \sigma(0, T_1) = \sigma(0, S_x)\). Consider determining the \(2\)-nd and succeeding characters of \(T\) in order while extracting from \(S\) the first half \(T\) of \(TT\) starting from \(p_1\) and the second half \(T\) of \(TT\) starting from \(q_1\), in parallel. Precisely,

  • We first determine which lowercase English character should be \(T_2\). Let \(p_2 = \sigma(p_1, T_2)\) and \(q_2 = \sigma(q_1, T_2)\), and extract the \(p_2\)-th and \(q_2\)-th characters of \(S\).
  • Then we determine which lowercase English character should be \(T_3\). Let \(p_3 = \sigma(p_2, T_3)\) and \(q_3 = \sigma(q_2, T_3)\), and extract the \(p_3\)-th and \(q_3\)-th characters of \(S\).
  • \(\cdots\)
  • Then we determine which lowercase English character should be \(T_n\). Let \(p_n = \sigma(p_{n-1}, T_n)\) and \(q_n = \sigma(q_{n-1}, T_n)\), and extract the \(p_n\)-th and \(q_n\)-th characters of \(S\).

In order to find (\(\star\)), it is sufficient to count the strings \(T\) constructed by the process above such that \(\sigma(p_n, T_1) = q_1\) (a necessary condition of the rule \(\spadesuit\)).

This can be computed by a Dynamic Programming (DP). Specifically, define a DP table by

\(dp[i][j] = \) (the number of \(T = T_1T_2\ldots T_n\) such that \(p_n = i\) and \(q_n = j\));

then, for each \((i, j)\), perform the following \(26\) transitions:

\(dp[\sigma(i, c)][\sigma(j, c)] \leftarrow dp[\sigma(i, c)][\sigma(j, c)] + dp[i][j]\),

which corresponds to the choice of the \((n+1)\)-th character \(T_{n+1}\) of \(T\). (Here, we ignore \(c\) such that \(\sigma(i, c) = \infty\) or \(\sigma(j, c) = \infty\).)

The desired (\(\star\)) is obtained as the number of \(T\) such that \(\sigma(p_n, T_1) = q_1\); that is, \(\sum_{\sigma(i, S_x) = x}\sum_j dp[i][j]\).

The time complexity of this DP is \(O(N^2)\). Computing this for all positions \(x\) that can be \(q_1\) (\(O(N)\) cases), we can solve this problem in a total of \(O(N^3)\) time.

posted:
last update: