Official

E - Keep Being Substring Editorial by evima


Let +L, -L, +R, and -R denote the four operations on \(X\): addition to the beginning, deletion from the beginning, addition to the end, and deletion from the end, respectively.

・If \(X\) and \(Y\) have a common integer

Let \(S\) be the longest sequence that is a contiguous subsequence of both \(X\) and \(Y\). Then, the optimal sequence of operations looks as follows.

  • First, -L and -R are repeated so that \(X\) matches \(S\).
  • Then, +L and +R are repeated so that \(X\) matches \(Y\).

It consists of \(|X| + |Y| - 2|S|\) operations.

\(S\) can be found in \(\mathrm{O}(|X|+|Y|)\) time using a suffix array and the LCP array.

・If \(X\) and \(Y\) do not have a common integer

Here, the following additional rule can be imposed without affecting the optimal solution. (The proof is at the end of this editorial.)

When \(X\) and \(Y\) have no common integer and \(|X| \geq 2\), only -L and -R may be performed.

With this rule, one can only match the given \(X\) to \(Y\) in the following manner.

  1. First, repeat -L and -R until \(X\) has a length of \(1\).
  2. Repeat \((\ast)\) below as long as \(X\) and \(Y\) have no common integer:
    • Perform +L or +R and then perform -L or -R. \(\cdots (\ast)\)
  3. Match \(X\) to \(Y\) by repeating +L and +R.

Steps 1. and 3. always consist of \(|X|-1\) and \(|Y|-1\) operations, respectively, so let us minimize the number of operations in step 2.

Each time \((\ast)\) is performed, \(X\) changes from a sequence of length \(1\) into another of length \(1\). For different integers \(u\) and \(v\), it is possible to transition from the state \(X = (u)\) to the state \(X = (v)\) by performing \((\ast)\) once if and only if:

\(A\) has the sequence \((u, v)\) or \((v, u)\) as a contiguous subsequence.

This can be rephrased into the following.

In the undirected graph \(G\) whose vertex set is \(\lbrace 1, 2, \ldots, N \rbrace\) and whose edge set is \(\lbrace 1, 2, \ldots, N \rbrace\), \(u\) and \(v\) are adjacent to each other.

Additionally, the only element of \(X\) at the beginning of 2. can be freely chosen from \(\mathcal{X} := \lbrace X_1, X_2, \ldots, X_P \rbrace\) according to what is left in 1., and the only element of \(X\) at the end of 2. can be any of \(\mathcal{Y} := \lbrace Y_1, Y_2, \ldots, Y_Q \rbrace\). Therefore, the minimum number of operations in 2. can be found as \(2D\) by solving the shortest path problem to find:

\[ D := \min \lbrace \mathrm{dist}(s, t) : s \in \mathcal{X}, t \in \mathcal{Y} \rbrace. \]

The proof that the additional rule does not affect the optimal solution

Let \(op_1, op_2, \ldots, op_n\) be an optimal sequence of operations that match \(X\) to \(Y\). As shown below, one can obtain an optimal sequence of operations respecting the additional rule by repeating the following action: as long as the sequence has operations violating the additional rule, swap the last of them and the subsequent operation.

Assume that the last operation violating the additional rule is \(op_i = \) +R. Just before executing \(op_i\), \(X\) and \(Y\) has no common integer, so \(op_i\) cannot be the last operation, and the subsequent operation \(op_{i+1}\) exists.

  • If the integer added by \(op_i\) is contained in \(Y\), it follows from the optimality of the sequence of operations that \(op_{i+1} = \) -L.
  • If the integer added by \(op_i\) is not contained in \(Y\), it follows from the following reasons that \(op_{i+1} = \) -L again.
    • If \(op_{i+1} = \) -R, a shorter sequence of operations could be obtained by cancelling out \(op_i\) and \(op_{i+1}\), contradicting with the optimality.
    • If \(op_{i+1} = \) +L or \(op_{i+1} = \) +R, then \(op_{i+1}\) would also violate the additional rule, contradicting with the choice of \(i\).

Since \(|X| \geq 2\) just before executing \(op_{i}\), the sequence of operations obtained by swapping \(op_{i}\) = +R and \(op_{i+1}\) = -L is also feasible and optimal.

For the case where \(op_i = \) +L, one can use a symmetric argument to the above.

The sequence after the swap has one fewer pair of integers \((i, j)\) such that \(1 \leq i \lt j \leq n\), \(op_i =\) +L or +R, and \(op_j =\) -L or -R than the sequence before the swap. Thus, this repetition of swaps halts after a finite number of swaps, yielding an optimal sequence of operations respecting the additional rule.

posted:
last update: