## G - Colorful Candies 2 Editorial by en_translator

Let \(C\) be the total number of distinct kinds of candies. Without loss of generality, we assume there are candies with Color \(1\) to Color \(C\). Obviously, the answer does not depend on the order of candies, but the number of candies with each color. For \(i = 1, 2, \ldots, C\), let \(n_i\) be the number of candies with Color \(i\).

Let us first fix the number of candies \(K\) that Takahashi chooses, and find the expected value of the number of distinct colors of candies in Takahashi’s \(K\) candies.

For \(i = 1, 2, \ldots, C\), let us define a random variable \(X_i\) to be \(X_i = 1\) if Takahashi’s \(K\) candies has one or more candy with Color \(i\), or \(X_i = 0\) otherwise.

Then, the number of distinct colors of candies in the \(K\) candies can be expressed as \(\sum_{i = 1}^C X_i\). What we want is its expected value \(E\lbrack \sum_{i=1}^C X\rbrack\).

By the linearity of expected value, \(E\lbrack \sum_{i=1}^C X \rbrack = \sum_{i=1}^C E\lbrack X_i \rbrack\). Also, by the definition of \(X_i\), \(E\lbrack X_i \rbrack\), is equal to “the probability that \(K\) candies has one or more candy with Color \(i\),” so \(E\lbrack X_i \rbrack = \lbrace \binom{N}{K} - \binom{N-n_i}{K} \rbrace / \binom{N}{K}\).

Therefore, \(E\lbrack \sum_{i=1}^C X \rbrack = \sum_{i=1}^C E\lbrack X_i \rbrack = \sum_{i=1}^C \lbrace \binom{N}{K} - \binom{N-n_i}{K} \rbrace / \binom{N}{K}\), so finding \(E\lbrack \sum_{i=1}^C X\rbrack\) boils down to computing \(\sum_{i=1}^C \lbrace \binom{N}{K} - \binom{N-n_i}{K} \rbrace\).

Since the number of kinds of candies \(C\) can be at most \(N\), so if we compute this sum naively, it costs \(\mathrm{\Theta}(N)\) computation time.
Since it costs \(\mathrm{\Theta}(N)\) time for one \(K\), if we do this for all \(K = 1, 2, \ldots, N\) it costs \(\mathrm{\Theta}(N^2)\) at worst, which does not fit in the time limit.

Now we want to compute \(\sum_{i=1}^C \lbrace \binom{N}{K} - \binom{N-n_i}{K} \rbrace\) faster.
Since \(N\) is a constant given as an input, and \(K\) is currently fixed, the value \(\binom{N}{K} - \binom{N-n_i}{K}\) only depends on the value \(n_i\). That is, if \(n_i = n_j\), then \(\binom{N}{K} - \binom{N-n_i}{K} = \binom{N}{K} - \binom{N-n_j}{K}\).
From this viewpoint, we can combine colors \(i\) with the same \(n_i\). Namely, let \(a_x\) be the number of integers \(1 \leq i \leq C\) such that \(n_i = x\), and express the value as
\(\sum_{i=1}^C \lbrace \binom{N}{K} - \binom{N-n_i}{K} \rbrace = \sum_{x = 1}^N a_x \lbrace \binom{N}{K} - \binom{N-x}{K} \rbrace\).

Here, since the total number of candies can be expressed as \(N = a_1 + 2 a_2 + 3 a_3 + \cdots N a_N\), there are at most \(\mathrm{O}(\sqrt{N})\) positive values in \(a_1, a_2, \ldots, a_N\). (This is because, letting \(M\) be the number of positive elements, \(N = a_1 + 2 a_2 + 3 a_3 + \cdots N a_N \geq a_1 + 2 a_2 + 3 a_3 + \cdots + M a_M \geq 1 + 2 + 3 + \cdots + M\))

Therefore, \(\sum_{x = 1}^N a_x \lbrace \binom{N}{K} - \binom{N-x}{K} \rbrace\) can be expressed as a sum of \(\mathrm{O}(\sqrt{N})\) terms, which can be computed in an \(\mathrm{O}(\sqrt{N})\) time.

Now that we can find the answer in an \(\mathrm{O}(\sqrt{N})\) for a single \(K\), we can do this for each \(K = 1, 2, \ldots, N\) to solve this problem in a total of \(\mathrm{O}(N\sqrt{N})\) time.

posted:

last update: