## D - Almost Bubble Sort Editorial by evima

We will assign each element of the permutation a type, either \(L\) or \(R\). Then, the number of swaps required will be the inversion count obtained after adding \(+N\) to the elements of type \(R\).

First, let’s prove the following proposition:

- In the optimal solution, there does not exist a pair \((i, j)\) such that \(i < j\), \(P_i > P_j\), \(P_i\) is of type \(R\), and \(P_j\) is of type \(L\), simultaneously.

We will prove this by contradiction. Assume that in an optimal solution, there exists a pair \((i, j)\) such that \(i < j\), \(P_i > P_j\), \(P_i\) is of type \(R\), and \(P_j\) is of type \(L\). Take the pair \((i, j)\) where \(j - i\) is the largest. By changing \(P_i\) to type \(L\) and \(P_j\) to type \(R\), we can reduce the inversion count by at least \(1\) (the detail is omitted as it is just a case-by-case calculation). This contradicts the optimality of the original solution. Thus, the proposition is proved.

For each pair \((i, j)\) (\(1 \leq i < j \leq N\)), consider how much it contributes to the final inversion count based on the types of \(P_i\) and \(P_j\). The summary is shown in the following table:

```
P_i < P_j P_i > P_j
i\j| L | R | i\j| L | R |
L | 0 | 0 | L | 1 | 0 |
R | 1 | 0 | R | 1 | 1 |
```

However, according to the previously proven proposition, the case where \(P_i > P_j\) and \(i, j\) are of types \(R, L\) respectively does not appear in the optimal solution. Therefore, increasing the cost of this case does not change the optimal solution. This is rather sudden, but we set the cost of this case to \(3\).

```
P_i < P_j P_i > P_j i<j i<j, P_i > P_j
i\j| L | R | i\j| L | R | i\j| L | R | | L | R |
L | 0 | 0 | L | 1 | 0 | -> L | 0 | 0 | + i | 0 | 1 |
R | 1 | 0 | R | 3 | 1 | R | 1 | 0 | j | 1 | 0 |
```

This cost can be represented by a minimum cut. Specifically, the original problem reduces to finding the minimum \(S-T\) cut in a graph obtained by the following steps:

- Prepare super vertices \(S, T\) and vertices \(1, 2, \cdots, N\).
- For each pair \((i, j)\) (\(1 \leq i < j \leq N\)), add an edge from \(j\) to \(i\) with capacity \(1\).
- For each pair \((i, j)\) (\(1 \leq i < j \leq N\), \(P_i > P_j\)), add edges from \(S\) to \(i\) and from \(j\) to \(T\) with capacity \(1\).

For each vertex \(1 \leq i \leq N\), let \(X_i\) be the capacity of the edge from \(S\) to \(i\) and \(Y_i\) be the capacity of the edge from \(i\) to \(T\). \(X_i\) and \(Y_i\) are easily determined. Next, find the minimum cut = maximum flow capacity of this graph.

For each vertex \(1 \leq i \leq N\), define its influx \(f_i\) as \((\text{flow into } i) - (\text{flow out of } i)\). In a valid \(S-T\) flow, \(f_i = 0\) must be satisfied, but we consider flowing appropriate flows through each edge without this constraint.

First, for all \(i\), flow the maximum capacity through the \(S \to i\) edges and the \(i \to T\) edges. Then, consider how to use the \(j \to i\) edges (\(i < j\)). Using the \(j \to i\) edge increases \(f_i\) by \(1\) and decreases \(f_j\) by \(1\).

After deciding which \(j \to i\) edges to use, we need to adjust the current flow to a valid \(S-T\) flow and then find the maximum flow capacity, but this is actually simple. For each vertex \(i\), if \(f_i > 0\), push back \(f_i\) to \(S\), and if \(f_i < 0\), push back \(|f_i|\) to \(T\). It seems difficult to handle cases where \(f_i > X_i\) or \(|f_i| > Y_i\), but we do not need to consider such cases. Just think of initially adding \(10^{100}\) to each of \(X_i\) and \(Y_i\).

In the end, the goal is to minimize \(\sum_{1 \leq i \leq N} |f_i|\).

This can be solved greedily as follows:

- For each \(j = 1, 2, \cdots, N\), perform the following operation:
- Sort the current \(f_1, \cdots, f_{j-1}\) in ascending order to get \(g_1, \cdots, g_{j-1}\). For each \(1 \leq i < j\), if \(g_i + 1 \leq f_j - i\), use the \(j \to i\) edge.

Proof of the correctness of the greedy method: At any point, if \(y - x \geq 2\), it is permissible to change \((x, y) \to (x + 1, y - 1)\). This can be understood by running an induction from the back. The operation of doing \((x, y) \to (x + 1, y - 1)\) as much as possible matches the greedy method described above.

Finally, we want to execute this greedy method quickly. The simplest way to consider but difficult to implement is using a balanced binary search tree. However, it can actually be implemented using only Fenwick Trees. We will manage the number of \(i\) satisfying \(f_i = v\) for each \(v\) (\(-N \leq v \leq N\)).

The most difficult query is “Given \(k\), change \(c_{i+1} := c_i\) for all \(i < k\).”

To handle this quickly, we prepare sequences \(a\) and \(b\). In short, \(a\) manages active indices. More precisely, each element of \(a\) is \(0\) or \(1\). If the \(x\) is the position of the \(i\)-th \(1\) in \(a\), the value of \(c_i\) is stored in \(b_x\). This way, the most difficult query mentioned earlier can be handled by changing one element of \(a\) from \(1\) to \(0\). Note that \(i\) can be negative, but this can be resolved by adding an appropriate offset.

By managing the values of \(a\), \(b\), and \(a_i + b_i\) with Fenwick Trees, all operations required by the greedy method can be processed in \(O(\log N)\) time. Therefore, this problem can be solved in \(O(N \log N)\) time overall.

posted:

last update: