Official

F - Common Prefixes Editorial by en_translator


We denote the length of string \(S\) by \(|S|\). Also, we use the term similarity and the notation \(f\), \(S_i\) just as defined in the problem statement.

We assume that indices of arrays are 0-based.

The idea of the solution

Since we want to find the sum of lengths of common prefixes of suffixes, using LCP Array (Longest Common Prefix Array) seems to be a good idea.

Suffix Array

Consider sorting the suffixes of a string \(S\) in the lexicographical order.

For example, the suffixes of aacab are aacab, acab, cab, ab, b, (empty string), and when lexicographically sorted, they become (empty string), aacab, ab, acab, b, cab.

If we store actual string to represent the sorted array, it requires \(O(|S|^2)\) memory; however we only need to store the position (the index in \(S\) of the first character) of the prefixes, so it actually need only \(O(|S|)\) memory. In the example above, what we want to find is \((5,0,3,1,4,2)\). We call this array “Suffix Array.” Let us denote the array by \(SA\).

If we actually compare the suffix strings in order to obtain the Suffix Array, it will require \(O(|S|^2\log |S|)\) time, but with an appropriate algorithm it can be constructed in an \(O(|S|)\) time (regarding the number of distinct alphabets as a constant). (This algorithm is called SA-IS. We will not describe the details. There are some other algorithms that requires \(O(|S|\log |S|)\) or \(O(|S|(\log |S|)^2)\) time but are relatively easy to implement)

LCP Array

Now we consider the similarity of suffixes of two strings that are adjacent to each other when sorted lexicographically. For example, in the example above,

  • The similarity between (空文字列) and aacab is \(0\)
  • The similarity between aacab andab is \(1\)
  • The similarity between ab and acab is \(1\)
  • The similarity between acab and b is \(0\)
  • The similarity between b and cab is \(0\)

The resulting array is called LCP Array. In the example above, it is \((0,1,1,0,0)\). If we have obtained Suffix Array, the LCP Array can be constructed in an \(O(|S|)\) time with an appropriate algorithm. We denote this array by \(LCPA\).

Calculating the similarity of suffixes

If three strings \(A,B,C\) are in the lexicographical order of \(A < B < C\), then \(f(A,C)=\min(f(A,B),f(B,C))\). So, for any \(i < j\) we have

\(f(S_{SA_i},S_{SA_j})=\min(f(S_{SA_i},S_{SA_{i+1}}),\ldots,f(S_{SA_{j-1}},S_{SA_j})) =\min(LCPA_i,\ldots,LCPA_{j-1}),\)

which means that the similarity of two suffixes can be obtained as a range min of LCP Array (unless those two are identical). Therefore, in order to solve the original problem, it is sufficient to solve the following problem:

Problem:
Given is a sequence \(A=(A_0,\ldots,A_{N-1})\) of length \(N\). \(A_0\) is \(0\).
For \(i<j\) let \(g(i,j)=\min(A_i,A_{i+1},\ldots,A_{j-1})\). Find
\(g(1,k)+g(2,k)+\ldots+g(k-1,k)+g(k,k+1)+g(k,k+2)+\ldots+g(k,N)\)
for each \(k=1,\ldots,N\).

(Note that \(k\) in this problem is \(SA_k\) in the original problem)

Solving the transformed problem

Let \(B_k\) be the first half of the desired expression, \(g(1,k)+g(2,k)+\ldots+g(k-1,k)\). Here we explain how to find \(B_k\). The latter half can be found in the same way.

We try to find \(B_k\) in the increasing order from \(k=1\). What is the relationship between \(B_k\) and \(B_{k+1}\)? Since

\(\begin{array}{} B_k&=&g(1,k)&+&\ldots&+&g(k-1,k)&\\ B_{k+1}&=&g(1,k+1)&+&\ldots&+&g(k-1,k+1)&+&g(k,k+1)\\ &=& \min(g(1,k),A_k) &+&\ldots&+&\min(g(k-1,k),A_k)&+&A_k, \end{array}\)

\(B_k\) can be found as follows.

  • Let \(X\) be a multiset that is initially empty. For each \(k=1,\ldots,N\), perform the following three operations:
    • 1. Replace all the elements \(x\) in \(X\) by \(\min(x,A_{k-1})\)
    • 2. Add \(A_{k-1}\) to \(X\)
    • 3. Find the sums of all the elements in \(X\). This is \(B_k\)

If we implement this naively, it costs \(O(N^2)\) time at worst. Instead, we can manage the pairs of values and their numbers. Since every Operation 1 decreases the numbers of distinct elements in \(X\), the number of operations is at most \(O(N)\), so the problem has been solved in a worst time of \(O(N \log N)\) with ordered multiset or priority queue.

Moreover, since the number added in Operation 2 is no less than \(\max X\) at that point, we can use a stack as a data structure of managing \(X\), in which case the time complexity is \(O(N)\).

Therefore, the problem has been solved in a total of \(O(N)\) time.

Note that the construction of Suffix Array and LCP Array is provided by ACL (AtCoder Library), so if you are using C++, it will be useful to use this.

posted:
last update: