Official

F - XY Ladder LCS Editorial by evima


Here is the key observation: you can always make the length of the longest common subsequence (LCS) at least \(\displaystyle\left\lfloor\frac{2N}{3}\right\rfloor\). This can be seen by dividing the pair of input strings into intervals of length \(3\) (and the remainder) and observing the following in each interval.

  • For an interval of length \(1\), the length of the LCS is \(0\) in the worst case where different characters are paired.
  • For an interval of length \(2\), if the first characters of the two strings are the same, the length of the LCS is already at least \(1\); otherwise, you can always match one of them to the second character of the other string, so the length of the LCS is at least \(1\).
  • For an interval of length \(3\), if the first or third characters of the two strings are the same, the case will be reduced into the case with length \(2\), and the length of the LCS is at least \(2\); if the first characters and third characters are both different in the two strings, you can always match both of them to the second characters, so the length of the LCS is at least \(2\).

(For an interval of length \(4\), one of the worst cases is the pair of XXYY and YXYX, in which case the length of the LCS is \(2\) after any combination of swaps.)

From this observation, an optimal solution never matches two characters with a distance greater than \(\displaystyle\left\lceil\frac{N}{3}\right\rceil\). Thus, the problem can be solved by letting \(\mathrm{dp}(k, R) \ \ \left(0 \le k \le N,\ |R| \le \displaystyle\left\lceil\frac{N}{3}\right\rceil\right)\) be the tentative lexicographically smallest LCS made from the first \(k\) characters of the two strings while leaving the string \(R\) unmatched and available, and computing these values in ascending order of \(k\) by dynamic programming (DP). The total complexity will be \(\mathrm{O}(N 2^{\frac{N}{3}})\) by properly encoding \(R\) and \(\mathrm{dp}(k, R)\) into 64-bit integers and pre-computing the transitions that are independent of \(k\).

Encoding $R$ and $\mathrm{dp}(k, R)$ The strings of length at most $N$ consisting of X and Y can correspond one-to-one to the bit sequences representing positive integers less than $2^{N+1}$ in binary. Specifically, X and Y should be replaced with $1$ and $0$, respectively, and the topmost bit that is $1$ should just be seen as the representation of the length. For instance, XXY corresponds to $(1110)_2 = 14$, YY corresponds to $(100)_2 = 4$, and the empty string corresponds to $(1)_2 = 1$. Comparing integers in this representation is equivalent to comparing the original strings based on length and reversed lexicographical order (length is prioritized), so you can compute the transitions of the DP by comparing 64-bit integers normally.

Pre-computing the transitions In the above encoding, $R$ corresponds to a positive integer less than $2^{18}$ in the largest case. When matching an unused character, it is not disadvantageous to greedily use the leftmost character available. Thus, for each positive integer less than $2^{18}$, you can naively pre-compute the first positions of $0$ and $1$, ignoring the topmost $1$, in the binary representation of that integer to compute each transition in constant time. The complexity of the pre-computation is also $\mathrm{O}(N 2^{\frac{N}{3}})$.

Sample implementation

posted:
last update: