F - Lost and Pound Editorial by en_translator
We will call sequence of steps 1, 2, and 3 a round.
Let \(\mathrm{DP}[t][k]\) be the winning property when the button has been pressed \(k\) times after the \(t\)-th round. We compute this DP (Dynamic Programming) in descending order of \(t\).
With \(\mathrm{DP}[t+1][*]\) known, how can we find \(\mathrm{DP}[t][k]\)? If he presses button \(i\) \(c_i\) times for each \(i=1,\dots,N\) in the \((t+1)\)-th round, then the probability that each button is the winning button is \(\frac{1}{N}\), so the winning probability is \(\frac{1}{N}\sum_{i=1}^{N} \mathrm{DP}[t+1][k+c_i]\). Thus, considering all combinations of the numbers of pressing buttons, we obtain the following recurrence relations:
\(\displaystyle \mathrm{DP}[t][k]=\max_{c_1+\dots+c_N=M}\frac{1}{N}\sum_{i=1}^{N} \mathrm{DP}[t+1][k+c_i]\)
We will consider how to evaluate it fast. Fix \(t\) and \(k\) and define
\(\displaystyle \mathrm{DP}'_{t,k}[n][s]=\max_{c_1+\dots+c_n=s}\sum_{i=1}^n \mathrm{DP}[t+1][k+c_i].\)
Then by definition, \(\mathrm{DP}[t][k]=\frac{1}{N}\mathrm{DP}'_{t,k}[N][M]\). Since the order in \(c\) does not affect to \(\mathrm{DP}'_{t,k}[N][M]\), it is sufficient to assume that \((c_i=0 \land i\leq i') \Rightarrow c_{i'}=0\), that is, indices \(i\) with \(c_i=0\) occur all at once in the end. Considering how many non-zero elements \(c\) have, we define
\(\displaystyle \mathrm{DP}''_{t,k}[n][s]=\max_{\substack{c_1+\dots+c_n=s\\ c_i>0}}\sum_{i=1}^n \mathrm{DP}[t+1][k+c_i]\)
to obtain \(\displaystyle \mathrm{DP}'_{t,k}[N][M]=\max_{1\leq n\leq \min(M,N)} (\mathrm{DP}''_{t,k}[n][M]+(N-n)\mathrm{DP}[t+1][k])\). By computing \(\mathrm{DP}''[n][M]\) within the range \(1\leq n \leq \min(M,N)\), we can find \(\mathrm{DP}'_{t,k}[N][M]\). The DP of finding \(\mathrm{DP}''\) can be processed in a total of \(O(M^3)\) time, with \(O(M^2)\) states, and each transition costing \(O(M)\).
By performing it for all \(t\) and \(k\), the problem has been solved in a total of \(O(TKM^3)\) time.
posted:
last update: