Official

B - Most Frequent Substrings Editorial by en_translator


Approach

For this problem, it is convenient to use a data structure called an associative array (like std::map in C++ and dictionary in Python). Associative arrays allow you to use arbitrary values — not necessarily integers — as keys; insertion and deletion of keys, as well as retrieving and updating the value associated to a specified key, can be done fast (for example in a logarithmic time of the size).

In our problem, we maintain an associative array \(memo\) indexed by strings, where the value \(memo[t]\) for a key \(t\) stores the number of occurrences of \(t\). Specifically, the algorithm can be described as follows:

  • For integers \(i = 1, \dots, N-K+1\), do the following:
    • Retrieve the substring consisting of the \(i\)-th through \((i+K-1)\)-th characters of \(S\). Denote it by \(t\).
    • Increment the value \(memo[t]\) by \(1\).

After \(memo\) is ready, we find the maximum value among the frequency counts \(memo[t]\). This can be done by setting an initial value \(ans = 0\) and applying the following step for all keys \(t\):

  • if \(memo[t]\) is greater than \(ans\), replace \(ans\) with \(memo[t]\).

Afterward, we can further perform the following step for all indices \(t\) again, so as to enumerate the strings that occurs exactly \(ans\) times:

  • if \(memo[t]\) equals \(ans\), add \(t\) to the answers.

Sample code (C++)

std::map internally sorts the elements in a strict weak ordering (spec draft). Also, in C++ strings are by default compared lexicographically. The for statement enumerates the elements in lexicographical order, allowing us to simplify the implementation.

#include <iostream>
using std::cin;
using std::cout;
using std::min;
using std::max;
#include <vector>
using std::vector;
#include <string>
using std::string;
#include <map>
using std::map;


int main (void) {
	// input
	int n, k;
	string s;
	cin >> n >> k;
	cin >> s;

	// make a dictionary: memo[t] := #appearance of t
	map<string, int> memo;
	for (int i = 0; i < n-k+1; i++) {
		string t = s.substr(i, k);
		memo[t] += 1;
	}
	// find max appearance
	int vmax = 0;
	for (auto &[key, val] : memo) {
		vmax = max(vmax, val);
	}
	// list all t with max appearance (for-map lists all keys in lexicographical order)
	vector<string> vs;
	for (auto &[key, val] : memo) {
		if (val == vmax) vs.push_back(key);
	}

	// output
	cout << vmax << "\n";
	for (int i = 0; i < vs.size(); i++) {
		if (i > 0) cout << " ";
		cout << vs[i];
	}
	cout << "\n";

	return 0;
}

posted:
last update: