G - Dynamic Scheduling 解説 by en_translator
There are various solutions for this problem (including one that utilizes properties of a matroid).
In this editorial, we introduce an approach that formalizes the problem as a cost-flow problem and optimize it.
Preprocess using segment-tree divide-and-conquer
First of all, for a problem with offline queries (where queries are all given beforehand), one can simplify the queries with a technique called segment-tree divide-and-conquer.
- Although very typical, this query does not seems to be named, so I have given a name referring to OI Wiki.
The queries asked in this problem can be rephrased as follows.
There is a set of tasks, initially empty. Process the following queries:
- Insert a task to the set (insertion query)
- Remove a task from the set (removal query)
- Print the answer to the subproblem
Simply put, it requires insertion and removal queries into a data structure.
In general, a data structure that manages a set hardly supports removal queries unless the data structure is much simpler. For example, Disjoint Set Union does not support removal queries.
With segment-tree divide-and-conquer trick, a removal query can be turned into an undo query. That is:
There is a set of tasks, initially empty. Process the following queries:
- Insert a task to the set (insertion query)
- Remove the last task inserted to the set (undo query)
- Print the answer to the subproblem
Unlike a removal query, an undo query can often be processed by managing the difference from the previous state in a stack without worsening the computational complexity. For example, Disjoint Set Union can easily support an undo query with this trick.
Thus, applying segment-tree divide-and-conquer we no longer have to support removal queries and can focus only on insertion queries, significantly simplifying the problem in some cases.
We now introduce the procedure of segment-tree divide-and-conquer.
First, we prepare a segment tree to manage the timeline. The timeline here refers to assigning the timing of processing the first query to time \(0\), second query to time \(1\), and so on. Each node of a segment tree stores a vector that manages elements.
Next, we apply preprocess the input appropriately to compute for each element to be inserted to the set the time span for which that element exists. Just like processing a segment-update query against a segment tree, insert the element to the nodes corresponding to \([l, r)\).
After the insertions, start a DFS (Depth-First Search) from the root of the segment tree. In the DFS, when you enter a node, insert the elements in the vector for the node; when you exit from a node, undo those elements. In this DFS, the set when arriving at the \(i\)-th leaf corresponds to the state of sets when the \(i\)-th query should be processed.
- For details, please refer to p.52 and later of ei1333’s slide (text in Japanese) which include colored visualizations taking Disjoint Set Union (as known as Union-Find).
The overall complexity is \(\mathrm{O}(Q \log T f(n))\), where \(Q\) is the number of queries, \(T\) is the maximum time, and \(\mathrm{O}(f(n))\) is the complexity of an insertion and an undo query.
Formalization as a cost-flow problem
Now that we have applied segment-tree divide-and-conquer, the problem is now boiled down to the following problem.
There is a set of tasks, initially empty. Process \(\mathrm{O}((N + Q) \log Q)\) of the following types:
- Insert a new task to the set
- Remove the last task inserted to the set
- Print the answer to the subproblem
For now, we ignore an undo query can be processed in the same complexity by appropriate process, as described above. Also, assuming one computes the optimal solution for every insertion query, now we have only one kind of query:
There is a set of tasks, initially empty. Process \(\mathrm{O}((N + Q) \log Q)\) of the following types:
- Insert a new task to the set, and print the answer to the subproblem
We will now try to solve this problem.
The problem can be described as a cost-flow problem as follows.
There is a graph with \((2N+2)\) vertices labeled \(S, A_1, \dots, A_N, B_1, \dots, B_N, T\).
The edges are added as follows.
- an edge \(A_i \to B_j\) with capacity \(1\) and cost \(0\) for each pair \((i, j)\) with \(j \leq D_i\)
- an edge \(B_j \to T\) with capacity \(1\) and cost \(0\)
- an edge \(T \to S\) with capacity \(\infty\) and cost \(0\)
The queries are then represented as follows.
- When inserting a task \(i\), add an edge \(S \to A_i\) with capacity \(1\) and cost \(-P_i\), and find the minimum cost circulation multiplied by \(-1\).
For the remainder of this discussion, we try to solve this problem.
Before processing a query, we will always the residual graph of an optimal cost flow (so that the residual graph never have a negative cycle.) Then, considering the behavior when a query adds an edge \(S \to A_i\) to the graph, we actually have the following fact:
Lemma
When a query is added, performing the following action at most once eliminates a negative cycle from the residual graph: if there is a negative cycle, remove a negative cycle with the smallest cost.
(Proof) If adding an edge does not produce a negative cycle, the optimality is kept without performing the action, so nothing is required. We now assume that a new negative cycle has arisen.
Since the previous residual graph does not have a negative cycle, the new negative cycle contains the added edge. (Thus, the amount of new flow to eliminate the negative cycle is \(1\).) Let \(C\) be such a negative cycle with the smallest cost. Also, denote by \(G\) the graph before the addition, and by \(G'\) the graph after adding and edge and pushing a flow of amount \(1\) to \(C\). It is sufficient to show that \(G'\) does not have a negative cycle.
Assume that \(G'\) has a negative cycle, denoting it by \(C'\).
The capacity of the new edge is \(1\), so \(C'\) does not contain the new edge. Also, assuming that \(C'\)’s edge consists only of the edges in \(G\) violates the optimality of \(G\). Thus, it contains the reverse edge that is added by pushing a flow in \(C\).
Here, consider the graph consisting of the portion of the edges in \(C\) with flow amount \(1\) each, and the portion of the edges in \(C'\) with flow amount \(1\) each. Then, this graph consists of the following three edge sets:
- One or more pairs of an edge in \(C\) and its reverse edge contained in \(C'\).
- Zero or more cycles in \(G\).
- Zero or one cycle consisting of the added edge and edges in \(G\).
The sum of costs of edges in this graph turns out to be negative, as \(C\) and \(C'\) are both negative cycles. Nevertheless, the costs of the edges in 1. add up to \(0\), and the cycles in \(G\) have \(0\) or greater costs as they are contained in \(G\). Thus, if no cycle for 3. exists, then the sum of cost of 1., 2., and 3. is negative, which is a contradiction. If there exists such a cycle \(C''\), denoting by \(w(c)\) the cost of a cycle \(c\), we have
\[\begin{aligned} w(C) + w(C') &= (contribution of 1.) + (contribution of 2.) + (contribution of 3.) \geq w(C'') \end{aligned}\]
\[\begin{aligned} w(C) \geq w(C'') - w(C') > w(C'') \end{aligned}\]
Thus, the \(C''\) is proved to have smaller cost than \(C\), which contradicts the minimality of the cost of the cycle \(C\).
Hence, assuming the existence of \(C'\) led to a contradiction, so \(G'\) does not have a negative cycle, forming a residual graph of an optimal cost flow. (End of proof)
Now that we have overcome the most difficult part of this problem (which the author struggled), but rephrasing this way or the lemma itself does not contribute to the optimization as it is. (Finding a negative edge with the largest cost is known to be NP-hard). Instead, we use the property of the graph to simulate the cost flow fast.
Simulating cost flow using task set
Directly dealing with cost flow does not lead to optimization, so we rephrase the operations on cost flow with operations against task set.
First, the constructed graph is a flow representation of two-partite graph matching. So, it is obvious that the set of \(i\) with flow \(S \to A_i\) corresponds to the set of tasks that can be finished by the deadline in the subproblem. Thus, a residual graph can be corresponded to a set of tasks completed before the deadline in the subproblem. Hereinafter, we denote by \(X\) the set of tasks completed by the deadline, and maintain \(X\) instead of the residual graph. In query style, it is represented as follows.
There is a set of tasks, initially empty.
Let \(X\) be the set of tasks that can be completed before the deadline in the optimal solution for the current task set.
Process \(\mathrm{O}((N + Q) \log Q)\) of the following types:
- Insert a new task \(I\) to the set, also updating \(X\). Print the total reward of the tasks in \(X\).
Now we consider the “also updating \(X\)” part in detail.
Let us first consider how a removal of a negative cycle can be represented using \(X\). When a residual graph contains a negative cycle, the negative cycle is in one of the following form:
- \(S \to A_i \to B_* \to A_* \to \dots \to B_* \to T \to S\)
- \(S \to A_i \to B_* \to A_* \to \dots \to B_* \to A_j \to S\)
These two cycles can be respectively rephrased to an operation against the task set as follows:
- Insert a task \(i\) to \(X\).
- Remove \(j\) from \(X\), and insert task \(i\).
Conversely, one can prove that if one of these operations is possible, then there exists a corresponding negative edge. Thus, the existence of the two operations is equivalent to the existence of the two types of negative cycle.
(Outline of proof) We prove that the former correspond to each other. (Same applies to the latter.) By merging the residual graph for \(X\) and that for \(X + \lbrace i \rbrace\) and removing self-loops, one can check that it consists of \(S \to A_i \to B_* \to \dots \to B_* \to T \to S\) and zero or more cost-\(0\) cycles, the former of which can be proven to be consisting of edges in \(G\). (End of proof)
Therefore, a negative cycle and its removal have been reworded by operations against the task set.
Next, let us rephrase the process of updating the residual graph by an operation against \(X\). By the lemma, the process of updating the residual graph in the cost-flow procedure can be achieved by:
- if there is a negative edge, remove one with the minimum cost.
The “minimum cost” part is precisely described as follows:
- If there is a negative cycle of form \(S \to A_i \to B_* \to \dots \to B_* \to T \to S\), push a flow.
- Otherwise, if there is a negative cycle of form \(S \to A_i \to B_* \to \dots \to B_* \to A_j \to S\), push a flow to the negative cycle with the smallest cost.
- Otherwise, do nothing.
This is equivalent to the following operation against the task set:
- If task \(i\) can be added to \(X\), add it.
- Otherwise, if there exists \(j\) such that \(P_i \gt P_j\) and one can insert task \(i\) by removing task \(j\) from \(X\), then choose \(j\) with the minimum \(P_j\), remove \(j\) from \(X\), and insert task \(i\).
- Do nothing otherwise.
By the discussion above, the formalization as a cost-flow problem is now put into operations against the task set.
Now the operations is not very abstract. The problem to solve is represented as a query form as follows:
There is a set of tasks, initially empty.
Let \(X\) be the set of tasks that can be completed before the deadline in the optimal solution for the current task set.
Process \(\mathrm{O}((N + Q) \log Q)\) of the following types:
- Insert a new task \(I\) to the set, also updating \(X\). Print the total reward of the tasks in \(X\).
- Perform the following operation against \(X\).
- If task \(i\) can be added to \(X\), add it.
- Otherwise, if there exists \(j\) such that \(P_i \gt P_j\) and one can insert task \(i\) by removing task \(j\) from \(X\), then choose \(j\) with the minimum \(P_j\), remove \(j\) from \(X\), and insert task \(i\).
- Do nothing otherwise.
- Print the total reward of the tasks in \(X\).
It’s been a long way, but we’re near the goal!
Optimization using Hall’s marriage theorem
In order to process the query above, let us construct a data structure as follows.
Let \(X\) be a set of task. A set \(X\) is called valid if all the tasks in \(X\) can be completed by the deadline in the subproblem. Construct a data structure that supports the following queries.
- Insert task \(i\) to \(X\).
- Remove task \(i\) from \(X\).
- Determine if \(X\) is valid.
- If \(X\) is not valid, find the \(i\) such that “removing \(i\) from \(X\) makes it valid” with the minimum \(P_i\).
Trying to process all kinds of query at once is too difficult, so let us first consider the first and third:
- Insert task \(i\) to \(X\).
- Determine if \(X\) is valid.
In order to process these two kinds of queries, consider maintaining information on tasks somehow. Recalling the requirement of the subproblem, the validity of \(X\) can be represented as a matching on a bipartite graph as follows:
Let \(U\) be the vertex set contained in \(X\), and \(V\) be the vertex set corresponding to the days. For \(u \in U\) and \(v \in V\), add an edge \(u \to v\) if day \(u\) is on or before the deadline of task \(u\). Then, \(X\) is valid if and only if there exists a matching that covers \(U\).
When it comes to finding the existence of a matching problem that determines the existence of a matching that covers all the vertices in one part of a bipartite graph, it’s Hall’s marriage theorem’s turn. Let us formulate this condition somehow using Hall’s marriage theorem.
For \(A \subseteq U\), let \(\Gamma(A)\) be the set of vertices adjacent to \(A\). Hall’s marriage theorem is expressed as follows:
Hall’s marriage theorem
There exists a matching that covers the vertices in \(U\) if and only if:
\[\forall A \subseteq U, |A| \leq |\Gamma(A)|.\]
This inequality is the condition with respect to sets on \(U\) side, but deforming it derives conditions on \(V\) side. (Proof omitted. One can understand it by considering the semantics of the inequality.)
Modified Hall’s theorem
There exists a matching that covers the vertices in \(U\) if and only if:
\[\forall B \subseteq V, |\lbrace u \vert u \in U, \Gamma(u) \subseteq B \rbrace| \leq |B|.\]
By deriving a condition with respect to sets on \(V\) side like this, the property of this problem can be utilized.
Let \(I_n\) be the subset of \(V\) consisting of day \(1\), day \(2\), \(\dots\), and day \(n\). For each task \(i\), there is an edge from the corresponding vertex to the vertices contained in \(I_{D_i}\). By this fact, it turns out that we only have to consider \(I_1, I_2, \dots, I_N\) as \(V\) in the inequality above. (End of proof)
The inequality on \(B = I_n\) can be deformed as follows:
\[ \begin{aligned} &|\lbrace u \vert u \in U, \Gamma(u) \subseteq I_n \rbrace| \leq |I_n| \\ \iff &|\lbrace u \vert u \in U, D_u \leq n \rbrace| \leq n \\ \iff &n - |\lbrace u \vert u \in U, D_u \leq n \rbrace| \geq 0. \end{aligned} \]
By the fact so far, the data structure to process queries can be constructed as follows.
Let $\(f(n) = n - |\lbrace u \vert u \in U, D_u \leq n \rbrace|.\)$
Instead of \(X\), we manage \(f(1), f(2), \dots, f(n)\) with a lazy segment tree that supports segment-add and segment-min.
- When inserting task \(i\) to \(X\), add \(-1\) to \(f(D_i), f(D_i + 1), \dots, f(N)\).
- Since \(X\) is valid if and only if \(f(1), f(2), \dots, f(N)\) are all non-negative, it is sufficient to check if \(\min(f(1), f(2), \dots, f(N)) \geq 0\).
Therefore, the problem has been solved for the case where only task-add and validness-check queries are given.
We have already explained the essential part so we are omitting the details, but the ignored second and fourth types of queries can also be handled somehow. Simply put, a query of the second type can be processed in the same way as the first, and the fourth can be processed by exploiting a std::set
or a point-update segment-min segment tree. For all queries, undo queries can be implemented easily.
Hence, the data structure that manages \(X\) has been constructed in \(\mathrm{O}(\log N)\) time per query. Thus, the overall problem can be solved in a total of \(\mathrm{O}((N + Q) \log Q \log N)\) time, which is fast enough.
Bonus
We have explained the intended solution so far, but the author holds an \(\mathrm{O}((N + Q) \log N)\) solution that answers all the queries correctly in his closed hand. The excuse to this awkward claim is that the author came up with this new solution at the night just before the day of the contest, and did not have time to prove the validity. So, we will close this editorial by leaving the following problem as a bonus.
Bonus: solve this problem in \(\mathrm{O}((N + Q) \log N)\) time, and prove the validity.
投稿日時:
最終更新: