B - Most Frequent Substrings Editorial by en_translator
Approach
For this problem, it is convenient to use a data structure called an associative array (like std::map in C++ and dictionary in Python).
Associative arrays allow you to use arbitrary values — not necessarily integers — as keys; insertion and deletion of keys, as well as retrieving and updating the value associated to a specified key, can be done fast (for example in a logarithmic time of the size).
In our problem, we maintain an associative array \(memo\) indexed by strings, where the value \(memo[t]\) for a key \(t\) stores the number of occurrences of \(t\). Specifically, the algorithm can be described as follows:
- For integers \(i = 1, \dots, N-K+1\), do the following:
- Retrieve the substring consisting of the \(i\)-th through \((i+K-1)\)-th characters of \(S\). Denote it by \(t\).
- Increment the value \(memo[t]\) by \(1\).
After \(memo\) is ready, we find the maximum value among the frequency counts \(memo[t]\). This can be done by setting an initial value \(ans = 0\) and applying the following step for all keys \(t\):
- if \(memo[t]\) is greater than \(ans\), replace \(ans\) with \(memo[t]\).
Afterward, we can further perform the following step for all indices \(t\) again, so as to enumerate the strings that occurs exactly \(ans\) times:
- if \(memo[t]\) equals \(ans\), add \(t\) to the answers.
Sample code (C++)
std::map internally sorts the elements in a strict weak ordering (spec draft). Also, in C++ strings are by default compared lexicographically. The for statement enumerates the elements in lexicographical order, allowing us to simplify the implementation.
#include <iostream>
using std::cin;
using std::cout;
using std::min;
using std::max;
#include <vector>
using std::vector;
#include <string>
using std::string;
#include <map>
using std::map;
int main (void) {
// input
int n, k;
string s;
cin >> n >> k;
cin >> s;
// make a dictionary: memo[t] := #appearance of t
map<string, int> memo;
for (int i = 0; i < n-k+1; i++) {
string t = s.substr(i, k);
memo[t] += 1;
}
// find max appearance
int vmax = 0;
for (auto &[key, val] : memo) {
vmax = max(vmax, val);
}
// list all t with max appearance (for-map lists all keys in lexicographical order)
vector<string> vs;
for (auto &[key, val] : memo) {
if (val == vmax) vs.push_back(key);
}
// output
cout << vmax << "\n";
for (int i = 0; i < vs.size(); i++) {
if (i > 0) cout << " ";
cout << vs[i];
}
cout << "\n";
return 0;
}
posted:
last update: