Official

E - Joint Two Strings Editorial by en_translator


For each \(S_i\), define

  • \(A_i\) as the maximum length of prefix of \(T\) contained as a subsequence of \(S_i\);
  • \(B_i\) as the maximum length of suffix of \(T\) contained as a subsequence of \(S_i\);

We first compute the sequences \(A = (A_1, A_2, \ldots, A_N)\) and \(B = (B_1, B_2, \ldots, B_N)\).

For each \(i\), one can obtain \(A_i\) by scanning \(S_i\) from its beginning and count how many first characters of \(T\) occur in \(S_i\). More specifically:

  • First, let the sought character be \(T_1\) the first character of \(T\), and scan \(S_i\) from its beginning.

  • While scanning \(S_i\), if the sought character \(T_1\) was found, change the sought character to \(T_2\), the second character of \(T\), and continue scanning \(S_i\) from there.

  • While scanning \(S_i\), if the sought character \(T_2\) was found, change the sought character to \(T_3\), the third character of \(T\), and continue scanning \(S_i\) from there.

  • \(\cdots\)

Finally, let \(A_i\) be the number of times that a sought character was found. In this procedure, \(S_i\) is scanned once from its beginning, so it costs \(O(|S_i|)\) time. Thus, the time required to obtain entire \(A\) is evaluated to be \(O(\sum_{i = 1}^N |S_i|)\); since \(\sum_{i = 1}^N |S_i| \leq 5 \times 10^5\), this is fast enough.

One can also find \(B\) in the same manner; but this time, reverse each string \(S_i\) and \(T\) before the procedure.

Additionally, let \(C_j\) be the number of indices \(i\) such that \(B_i = j\), and precompute \(C = (C_0, C_1, \ldots, C_{|T|})\) in a total of \(O(N)\) time.

Now we are ready to solve the problem. We call a pair \((i, j)\) of integers between \(1\) and \(N\) good if \((i, j)\) satisfies the condition in the problem statement; i.e. the concatenation of \(S_i\) and \(S_j\) contains \(T\) as a subsequence. Now let us count the number of good pairs, which is the answer to the problem.

We first consider a fixed \(i\) and count indices \(j\) such that \((i,j)\) is a good pair. Since \((i,j)\) is a good pair if and only if \(A_i + B_j \geq |T|\), an index \(j\) forms a good pair \((i, j)\) if and only if \(j\) is greater than or equal to \(l_i \coloneqq \max\lbrace |T| - A_i, 0 \rbrace\). Here, the number of \(j\) such that \(B_j\) is \(l_i\) or greater can be represented as \(\sum_{k = l_i}^{|T|} C_k\) using the sequence \(C\). Hence, the answer to the problem is obtained as the sum of this value over all \(i\):

\[\sum_{i = 1}^N \sum_{k = l_i}^{|T|} C_k. \tag{1}\]

Here, the number of terms in the summation (1) above is bounded by

\[ \begin{aligned} \sum_{i = 1}^N (|T| - l_i + 1) &= \sum_{i = 1}^N (|T| - \max \lbrace |T| - A_i, 0 \rbrace + 1)\\ &\leq \sum_{i = 1}^N (|T| - (|T| - A_i)+ 1)\\ &= \sum_{i = 1}^N A_i + N\\ &\leq \sum_{i = 1}^N |S_i| + N \\ &\leq 5 \times 10^5 + 5 \times 10^5, \end{aligned} \]

which is small enough, so the expression (1) can be implemented as a double-loop corresponding to the two sigmas, and this implementation is fast enough.

The following is sample code in C++ language.

#include <iostream>
#include <algorithm>
using namespace std;
typedef long long ll;

ll n;
string s[500001], t;
ll a[500001], b[500001], c[500001];

ll calc(const string &s, const string &t)
{
  ll g = 0;
  for(auto c : s){
    if(g >= (int)t.size()) break;
    if(c == t[g]) g++;
  }
  return g;
}

int main(void){
  cin >> n >> t;
  for(int i = 1; i <= n; i++) cin >> s[i];
  
  for(int i = 1; i <= n; i++) a[i] = calc(s[i], t);
  reverse(t.begin(), t.end());
  for(int i = 1; i <= n; i++){
    reverse(s[i].begin(), s[i].end());
    b[i] = calc(s[i], t);
    c[b[i]]++;
  }
  
  ll ans = 0;
  for(int i = 1; i <= n; i++){
    ll l = max(0LL, (ll)t.size() - a[i]);
    for(int j = l; j <= t.size(); j++){
      ans += c[j];
    }
  }
  cout << ans << endl;
  
  return 0;
}

posted:
last update: