Editorial - AtCoder Beginner Contest 226

Contest Duration: 2021-11-07 21:00:00+0900 - 2021-11-07 22:40:00+0900 (local time) (100 minutes) Back to Home

Official

H - Random Kth Max Editorial by en_translator

This problem asks how to treat random variables. Before getting down to business, we will make a short write-up on the knowledge required to solve this problem.

Probability density function and cumulative distribution function

Hereinafter we denote by \(P(\mathrm{cond})\) “the probability that \(\mathrm{cond}\) happens.”

The probability density function \(f(x)\) of a random variable \(X\) represents the probability density of the event that the random variable takes some value. In other words, \(f(x)\) is a function defined so that the probability of the random value’s taking the value within a certain range can be obtained as a definite integral of \(f(x)\), satisfying the following relationship between the probability:

\[P(a\leq X \leq b) = \int_a^b f(x) dx,\]

\[\int_{-\infty}^\infty f(x)dx = 1.\]

Next, the cumulative distribution function \(F(x)\) of a random variable \(X\) is a function on “the probability that \(X \leq x\),” which can be expressed as follows:

\[\mathrm{P}(X \leq x) = F(x),\]

\[\mathrm{P}(a \lt X \leq b) = F(b) - F(a).\]

From the definition above, there is a following relationship between the probability density function and the cumulative distribution function, that is, “the integral of a probability density function is a cumulative distribution function:”

\[F(x) = \int_{-\infty}^x f(t)dt.\]

Also, if a random variable \(X\) takes the value within the range \(a \leq x \leq b\), then the relations ship betwee the expected value of \(X\), \(E[X]\), and \(f(x)\) or \(F(x)\), can be expressed by the following equations. If you inspect the equations, you can see that both expressions have the same structure to the expression of the expected value for discrete variables.

\[E[X] = \int_{a}^b xf(x)dx,\]

\[E \lbrack X \rbrack = a + \int_{a}^{b} (1 - F(x)) dx.\]

(Example problem) \(m\)-th largest number

There are \(n\) random variables \(X_1,\ldots,X_n\), each of which follows a continuous uniform distribution on \([0,1]\). What is the expected value of the \(m\)-th largest variable?

A random variable \(X\) is greater than \(x\) with probability \(1-x\), and less than \(x\) with probability \(x\). So the probability that the answer is greater than \(x\), \(\overline{F}(x) := 1 - F(x)\), can be written as

\[\overline{F}(x) = \sum_{i=m}^n \binom{n}{i} (1-x)^i x^{n-i}.\]

Therefore, the expected value \(E\) can be obtained by integrating \(\overline{F}(x)\):

\[ \begin{aligned} E &= \int_{0}^{1} \left\lbrace \sum_{i=m}^n \binom{n}{i} (1-x)^i x^{n-i} \right\rbrace dx \\ &= \sum_{i=m}^n \binom{n}{i} \int_{0}^{1} (1-x)^i x^{n-i} dx. \end{aligned} \]

By the formula of beta function,

\[\int_{0}^{1} x^m (1-x)^n dx = \frac{m!n!}{(m+n+1)!},\]

so we obtain

\[ \begin{aligned} E &= \sum_{i=m}^n \binom{n}{i} \frac{i!(n-i)!}{(n+1)!} \\ &= \sum_{i=m}^n \frac{1}{n+1} \\ &= 1 - \frac{m}{n+1}. \end{aligned} \]

The main task

For simplicity, hereinafter we denote \(N = \max(R)\) for the complexity analysis.

Fix the segment and do DP (Dynamic Programming): \(\mathrm{O}(N^4)\)

Fix a necessary segment \(\lbrack A, A + 1\rbrack\). Then one can compute the following DP:

\(dp_{i,j,k}\) : the probability that among the first \(i\) random variables, \(j\) variables are greater than \(A+1\) and \(k\) variables are in the segment \([A, A+1]\)

which can be computed in a total of \(O(N^3)\) time. Moreover, the result of DP can be boiled down to the answer by solving the following problem: “what is the expected value of \((K - j)\)-th largest value of \(k\) variables in a segment?” By the example problem above, it can be converted in an \(O(1)\) time.
Therefore, the problem has been solved in a total of \(O(N^4)\) time.

Compute the cumulative distribution function and update their difference: \(\mathrm{O}(N^3)\)

Let \(X := \mathrm{kthmax}(\lbrace X_1, X_2, \dots,X_N \rbrace)\). Let us consider the probability that \(X\) is greater than \(x\). If we can compute this, then we can solve the original problem by integrating \(F(x)\).

\(F(x)\) can be represented by functions \(F_i(x) :=\) (the probability that \(X_i\) is greater than \(x\)) as follows:

\[F(x) = \sum_{S \subseteq X\lbrace 1,2,\dots,N\rbrace, |S| = K} \prod_{i \in S} F_i(x)\]

Also, since \(F(x)\) depends on whether \(x\) is inside or outside the segment of \(X_1,\dots,X_N\), so by splitting the segment into \(\lbrack a, a + 1\rbrack\) \((0 \leq a \lt \max(A))\), it can be represented by a simple expression within each segment.

Therefore, the problem can be solved by computing \(F(x)\) for each segment in a total of \(\mathrm{O}(N^4)\) time. Moreover, by appropriately updating only the differences of \(F(x)\), one can reduce the computational complexity to \(\mathrm{O}(N^3)\).

Optimization through divide and conquer: \(\mathrm{O}(N^2 \log^2 N)\)

One may optimize the DP above with divide-and-conquer accompanied by Taylor Shift, to a total of \(\mathrm{O}(N^2 \log ^2 N)\) time with a fairly large constant factor.