Official

D - Unique Username Editorial by en_translator


Preface

In this problem, strings of lengths at most \(16\) are constructed and compared many times. In this editorial, we assume it can be done in an \(\mathrm{O}(1)\) times.

Overview of the solution

This problem can be solved by effectively generating strings satisfying the first and second conditions and checking if it coincides with any of \(T_1,T_2,\ldots,T_M\) fast enough.

Generating strings satisfying the first and second conditions

The first sentence of the first condition reads “let \(S_1', S_2', \ldots,S_N'\) be a permutation of \(S_1', S_2', \ldots,S_N'\).” Such a permutation can be generated with, in C++ for example, next_permutation (Reference: an example of past problem that requires next_permutation). What follows is “let \(X\) be the concatenation of \(S_1'\), (\(1\) or more copies of _), \(S_2'\), (\(1\) or more copies of _), \(\ldots\), (\(1\) or more copies of _), and \(S_N'\).” Also, the second condition says “the length of \(X\) is between \(3\) and \(16\).” There are many ways to generate them; we will introduce an approach using recursion.
Let the argument of the recursive function be the string under construction and how many of \(S_1',S_2',\ldots,S_N'\) has been consumed. If the string requires _ at the tail, append it; otherwise, append the next string from \(S_1',S_2',\ldots,S_N'\) or _ if possible (i.e. if there is enough space to append the unused strings of \(S_1',S_2',\ldots,S_N'\) and mandatory _’s afterwards). Finally, if \(S_1',S_2',\ldots,S_N'\) are all consumed, a string satisfying the first and second conditions has been generated.
By determining if the generated string coincides with any of \(T_1,\ldots,T_M\), and print the string and exit the program if it doesn’t, you can find a sought string (or determine that no string is applicable), and your code will be accepted for this problem. In the sample code, the number of strings consumed from \(S_1',S_2',\ldots,S_N'\) is stored in the string being constructed as ans. Also, it maintains the number of _’s that may be appended more as well as \(S_1',S_2',\ldots,S_N'\) and \(T_1,T_2,\ldots,T_M\), but it is up to you whether to use global variables instead.

Checking if the generated string coincides with any of the \(M\) strings

When checking if the generated string coincides with any of \(T_1,T_2,\ldots,T_M\), we cannot spend an \(\mathrm{O}(M)\) time each, because a considerable amount of strings are generated (as described later).
One way to determine efficiently if a sequence of data (strings, this time) \(T\) contains a data \(x\) is to sort the sequence beforehand and do the binary search. In most languages such as C++, we can make a conditional branch that depends on whether it contains \(x\) by:

if(binary_search(T.begin(),T.end(),x)){
	//something
}

This costs an \(\mathrm{O}(\log |T|)\) time each, which is fast enough for this problem.

Estimating the execution time

We will estimate how much time does it need to execute the algorithm above. Generating a string costs \(\mathrm{O}(1)\), and checking costs \(\mathrm{O}(\log M)\), so all that left is how many strings are generated.
Firs, the strings generated by the way in this editorial are distinct (rather, we introduce how to generate distinct strings). If the generated string does not coincide with \(T_1,T_2,\ldots,T_M\), it prints it and exits, and continue executing otherwise. Here, it coincides with any of \(T_1,T_2,\ldots,T_M\) at most \(M\) times (by the Pigeonhole Principle). Thus, the complexity can be evaluated as \(\mathrm{O}(M\log M)\).
By the way, some of you may considered generating entire possible strings and then check for duplicates. In this hypothetical approach, the program never terminates until generating all the strings satisfying the first and second conditions, so we need to count the number of generated strings.
Suppose \(N\geq 2\). For a fixed possible \(N\), the worst case is when there are \(N\) strings of length \(1\) each. First, there are \(N!\) permutations \(S_1',S_2',\ldots,S_N'\). Plus, each has as many number of ways to generate strings as the number of inserting \(16-(2N-1)\) _’s to \(N\) slots (where the first \(N-1\) slots fit between \(S_1',S_2',\ldots,S_N'\), and the other is discarded if the total length is less than \(16\)). Therefore, at worst \(N!_{16-N}C_{N-1}\) strings are generated, which is small enough in the range of \(2\leq N \leq 8\).
By the way, we do not have to do such a evaluation but just write a code to generate them and count the number of executions of the recursive function to assert that there is no problem in the execution time (which I personally recommend).

Sample code

#include <bits/stdc++.h>
using namespace std;

void dfs(int cur,vector<string> &S,vector<string> &T,int remain,string ans){
	
	if(remain<0)return;
	
	if(cur == S.size()){
		if(ans.size()>=3&&!binary_search(T.begin(),T.end(),ans)){
			cout<<ans<<endl;
			exit(0);
		}
		return;
	}
	
	if(ans.size()>0 && ans.back()!='_'){
		dfs(cur,S,T,remain,ans + "_");
	}
	else{
		dfs(cur+1,S,T,remain,ans + S[cur]);
		if(ans.size()>0)dfs(cur,S,T,remain-1,ans + "_");
	}
	
}

int main() {
    
	int N,M;
	cin>>N>>M;
	
	vector<string> S(N);
	for(int i=0;i<N;i++)cin>>S[i];
	sort(S.begin(),S.end());
	
	vector<string> T(M);
	for(int i=0;i<M;i++)cin>>T[i];
	sort(T.begin(),T.end());
	
	int remain = 16;
	for(int i=0;i<N;i++)remain -= S[i].size();
	remain -= N-1;	
	
	do{
		dfs(0,S,T,remain,"");
	}
	while(next_permutation(S.begin(),S.end()));
	
	cout<<-1<<endl;
	
	return 0;
}

posted:
last update: