G - Select Strings Editorial by en_translator
For simplicity, we assume that \(S_i \neq S_j ( i \neq j)\).
Consider an \(N\)-vertex graph \(G\) that has a directed edge from vertex \(i\) to vertex \(j\) if \(S_i\) is a substring of \(S_j\) for the given \(S_1,S_2, \ldots ,S_N\). Then, \(G\) forms a DAG (Directed Acyclic Graph).
Also, suppose that each vertex has a weight of \(A_1,A_2, \ldots , A_n\).
Then, this problem can be rephrased as follows.
You are given a vertex-weighted \(N\)-vertex DAG \(G\).
Choose some vertices out of the \(N\) vertices without choosing a pair of vertices that has a path between them.
Find the maximum total weight of the chosen vertex.
First, let us ignore weights and try to maximize the number of chosen vertices.
If \(G\) has edges from vertex \(i\) \(\to\) vertex \(j\) and vertex \(j\) \(\to\) vertex \(k\), then it also has an edge from vertex \(i\) \(\to\) vertex \(k\). In such a DAG, one can use Dilworth’s theorem.
Dilworth’s theorem
For a DAG \(G\) such that if it has edges from vertex \(i\) \(\to\) vertex \(j\) and vertex \(j\) \(\to\) vertex \(k\), then it also has an edge from vertex \(i\) \(\to\) vertex \(k\),
\((\text{the size of a minimum path cover}) = (\text{the size of a maximum independent set})\)..
If the graph is not weighted, the problem is equivalent to finding the size of a maximum independent set, or a minimum path cover. (The size of a minimum path cover refers to the minimum number of paths when decomposing vertices of \(G\) into paths.)
In fact, the minimum path cover can be boiled down to the maximum matching of a bipartite graph.
Details
We will correspond the \(N\)-vertex DAG \(G\) onto a bipartite graph \(H\) with left vertices \(1,2,\ldots, N\) and right vertices \(1',2',\ldots,N'\). Then it holds that
\(N-(\text{maximum matching of }H)=(\text{size of a minimum path cover of }G)\),
because
\((\text{size of a path cover})=N-(\text{number of edges in the path cover})\),
so minimizing the size of a path cover corresponds to maximizing the number of edges in the path cover, and matchings of \(H\) and path covers of \(H\) correspond one-by-one.
Hence, minimum path cover can be boiled down to the maximum matching problem on the bipartite graph.
When actually solving the maximum matching problem, we prepare additional vertices \(S\) and \(T\), add edges \(S \to v\) and \(v' \to T\) for each vertex \(T\), and set the maximum capacity of every edge to \(1\), in order to boil it down to a maximum flow problem.
Therefore, the problem has been solved for a weighted graph (or a graph with equal weights). However, in this problem we need to find the maximum total weight of the chosen vertices.
In fact, this problem can be boiled down in the same way as the maximum matching problem of the bipartite graph without weights, considering copies of vertices according to the weights.
In this problem, the weight is very large, so we cannot have the same number of copies as the weight, but we can set the maximum capacity of \(S \to v\) and \(v' \to T\) to \(A_v\) and that of \(i \to j'\) to \(\infty\) for each vertex, in order to obtain the same answer as having the copies of vertices as their weights.
In other words, \(( \sum A_i - (\text{maximum flow of }S \to T))\) is the answer.
On implementation, one also have to consider how to add edges between vertices \(i\) and \(j\) with \(S_i=S_j\); one can solve it by adding edges \(i\to j\) if \(S_i=S_j\) and \(i<j\) to make \(G\) always a DAG. Alternatively, one can consider the vertices with the maximum \(A\) valid among the same strings.
Regarding the complexity, one can determine if \(S_i\) is a substring of \(S_j\) for all pairs \(i,j\) in a total of \(O(N^{2}+(\sum |S_i|)^2)\) time, and the maximum flow of a bipartite graph with \(O(N)\) vertices and \(O(N^{2})\) edges can be found in \(O(N^{4})\) time using ACL (AtCoder Library), so the problem can be solved in a total of \(O(N^{4} + (\sum |S_i|)^{2})\) time.
Sample code (C++):
#include <bits/stdc++.h>
#include <atcoder/maxflow>
using namespace std;
using ll = long long;
static constexpr ll inf = 1e10;
bool is_substring(const string& s, const string& t) {
const int n = s.size(), m = t.size();
for (int i = 0; i + m <= n; ++i) {
if (s.substr(i, m) == t) return true;
}
return false;
}
int main() {
int n;
cin >> n;
vector<string> s(n);
vector<ll> a(n);
for (int i = 0; i < n; ++i) cin >> s[i];
for (int i = 0; i < n; ++i) cin >> a[i];
atcoder::mf_graph<ll> g(2 * n + 2);
int source = 2 * n, sink = 2 * n + 1;
for (int i = 0; i < n; ++i) {
g.add_edge(source, i, a[i]);
g.add_edge(i + n, sink, a[i]);
}
for (int i = 0; i < n; ++i) {
for (int j = 0; j < n; ++j) {
if (i != j) {
if (s[i] == s[j]) {
if (i < j) {
g.add_edge(i, j + n, inf);
}
} else if (is_substring(s[i], s[j])) {
g.add_edge(i, j + n, inf);
}
}
}
}
const ll flow = g.flow(source, sink);
const ll sum = accumulate(a.begin(), a.end(), 0LL);
cout << sum - flow << endl;
}
posted:
last update: