HACK TO THE FUTURE 2023 qual（AtCoder Heuristic Contest 016）

A - Graphorean

Time Limit: 5 sec / Memory Limit: 1024 MiB

ストーリー

天才発明家の高橋くんは、過去へとグラフを送ることの出来るタイムマシン「グラフォリアン:Graphorean」(グラフ:Graph + デロリアン:DeLorean)を発明した。このマシンを使って一攫千金を目論む高橋くんは、カジノのルーレットをプレイし、プレイ前の時刻へと当選番号の情報を送ろうと考えた。成功すれば見事番号を的中させた世界線へと移動し、大金持ちになることが出来る。

ただし、マシンでは直接数値を送ることは出来ないため、当選番号の数値を一度グラフに変換(エンコード)して送信し、送られてきたグラフを数値に戻す(デコード)作業が必要である。マシンでグラフを送ると、頂点番号の情報は失われ、更にノイズが入ってしまうため、正しく数値が復元できるように、エンコード・デコードの仕方を工夫しなければならない。また、N 頂点のグラフを受け取るためには、受け取る日時において予めマシンに整数 N を設定しておく必要があるため、送るグラフの頂点数はあらかじめ決めておく必要がある。

マシンは一度使うと壊れてしまうので、失敗は許されない。事前にシミュレーションをすることで成功確率を見積もり、出来るだけ成功確率が高くなるようなエンコード・デコードの方式を準備することにした。また、大きなグラフを送るには膨大なエネルギーが必要となるため、送るグラフのサイズは出来るだけ小さい方が望ましい。高橋くんの手伝いをして欲しい。

問題文

整数 M とエラー率 \epsilon が与えられるので、4\leq N\leq 100 の整数 N を決めて、頂点数が全て N であるような M 個の無向グラフ G_0,G_1,\cdots,G_{M-1} を作成し出力せよ。グラフは非連結でも構わない。その後、以下のクエリを 100 回処理せよ。

k 回目のクエリでは、N 頂点の無向グラフ H_k が与えられる。 H_k は、ある G_{s_k} から以下のようにして生成されたグラフである。

H_k=G_{s_k} と初期化する。
各 0\leq i<j\leq N-1 の組 (i,j) について、確率 \epsilon でH_k が辺 (i,j) を含むか否かを反転する。
H_k の頂点の順番をランダムに並び替える。

H_k の情報を受け取ったら、どのグラフ G_{s_k} から生成されたかを予測し、s_k の予測値 t_k を出力せよ。

得点

予測が外れた回数を E としたとき、以下の点数が得られる。 \[ \mathrm{round}\left(10^9\times \frac{0.9^E}{N}\right) \]

テストケース数

暫定テスト: 50個
システムテスト: 2000個、コンテスト終了後に seeds.txt (sha256=4093b6cb740beea16eb0ecf55120ca6ca6fbef18015ea4a863e64d0bea3de91d) を公開
システムテストは各 (M,\epsilon) の組を高々1つ含む

各テストケース毎に、\mathrm{round}(10^9\times \frac{自身の得点}{全参加者中の最高点}) の相対評価スコアが得られ、その和が提出の得点となる。

最終順位はコンテスト終了後に実施されるより多くの入力に対するシステムテストにおける得点で決定される。暫定テスト、システムテストともに、不正な出力や制限時間超過をした場合、そのテストケースのみ0点となる。システムテストは CE 以外の結果を得た一番最後の提出に対してのみ行われるため、最終的に提出する解答を間違えないよう注意せよ。

相対評価システムについて

暫定テスト、システムテストともに、CE 以外の結果を得た一番最後の提出のみが順位表に反映される。相対評価スコアの計算に用いられる各テストケース毎の全参加者中の最高点の算出においても、順位表に反映されている最終提出のみが用いられる。

順位表に表示されているスコアは相対評価スコアであり、新規提出があるたびに、相対評価スコアが再計算される。一方、提出一覧から確認出来る各提出のスコアは各テストケース毎の得点をそのまま足し合わせた絶対評価スコアであり、相対評価スコアは表示されない。最新以外の提出の、現在の順位表における相対評価スコアを知るためには、再提出が必要である。 (追記)不正な出力や制限時間超過をした場合、提出一覧から確認出来る絶対スコアは0となっているが、順位表には正解したテストケースに対する相対スコアの和が表示されている。

実行時間について

実行時間には多少のブレが生じます。また、システムテストでは同時に大量の実行を行うため、暫定テストに比べて数%程度実行時間が伸びる現象が確認されています。そのため、実行時間制限ギリギリの提出がシステムテストでTLEとなる可能性があります。プログラム内で時間を計測して処理を打ち切るか、実行時間に余裕を持たせるようお願いします。

入出力

まずはじめに、標準入力から以下の形式で問題設定に関する情報が与えられる。

M \epsilon

M は出力グラフ数を表す整数で、10\leq M\leq 100 を満たす。
\epsilon はエラー率を表す実数で、0.01 の倍数であり、0.00\leq \epsilon\leq 0.4 を満たす。

入力を読み込んだら、M 個のグラフ G_0,G_1,\cdots,G_{M-1} を以下の形式で標準出力に出力せよ。

N
g_0
\vdots
g_{M-1}

N は各グラフの頂点数を表す整数で、4\leq N\leq 100 を満たさなければならない。
各 g_k は k 番目のグラフ G_k を表す N(N-1)/2 文字の01文字列であり、0\leq i<j\leq N-1 を満たす各 (i,j) について、G_k が辺 (i,j) を含むならば 1、含まないならば 0 として (i,j) の辞書順に並べたものである。例えば、N=4 のとき、文字列 100101 は 4 点が直線上につながったグラフを表す。

出力の後には標準出力を flush しなければならない。 そうしない場合、TLEとなる可能性がある。 M 個のグラフの出力後、以下の処理を 100 回繰り返す。

k 回目の処理ではまず、N 頂点のグラフ H_k を先と同じ形式により N(N-1)/2 文字の 01 文字列として表したものが一つ、標準入力から一行で与えられる。入力を受け取ったら、H_k がどのグラフ G_{s_k} から生成されたものであるかを予測し、s_k の予測値 t_k (0\leq t_k\leq M-1) を標準出力に一行で出力せよ。出力の後には改行をし、更に標準出力を flush しなければならない。 k 回目の処理を完了するまで、k+1 回目の処理に対する入力は与えられないので注意せよ。

サンプルプログラム

Pythonでの解答例を示す。このプログラムでは、N=20 と設定し、グラフ G_k は k 本の辺を含むようにしている。 H_k を受け取ったら、辺の数 m を数え、\min(m, M-1) を出力している。

M, eps = input().split()
M = int(M)
eps = float(eps)
print(20)
for k in range(M):
    print("1" * k + "0" * (190 - k))

for q in range(100):
    H = input()
    t = min(H.count('1'), M - 1)
    print(t)

入力生成方法

L 以上 U 以下の整数値を一様ランダムに生成する関数を \mathrm{rand}(L,U) で表す。 M は \mathrm{rand}(10,100) により生成される。 \epsilon は 0.01\times \mathrm{rand}(0,40) により生成される。各 s_k は \mathrm{rand}(0,M-1) により生成される。

ツール(入力ジェネレータ・ビジュアライザ・ローカルテスタ)

Web版: ビジュアライズ機能により各入出力グラフの形状を確認出来ます。
ローカル版: 使用するにはRust言語のコンパイル環境をご用意下さい。
- Windows用のコンパイル済みバイナリ: Rust言語の環境構築が面倒な方は代わりにこちらをご利用下さい。

コンテスト期間中に、ビジュアライズ結果の共有や、解法・考察に関する言及は禁止されています。ご注意下さい。

ツールで用いられる入出力ファイルの仕様

ローカルテスタに与える入力ファイルは以下の形式を用いている。

M \epsilon
s_0
\vdots
s_{99}
\mathrm{seed}

最後の \mathrm{seed} はノイズ生成に用いられる乱数シード値である。各グラフ H_0,\cdots,H_{99} は出力されたグラフ G_0,\cdots,G_{M-1} に依存するため、入力ファイルには乱数シード値のみが記載されている。

ローカルテスタは解答プログラムの出力をそのまま出力ファイルに書き出す。解答プログラムは、# から始まるコメント行を出力に含めても良い。 Web版ビジュアライザを使用すると、コメント行を対応するクエリと合わせて表示出来るため、デバッグや考察等に役立てることが出来る。ジャッジプログラムはコメント行を全て無視するため、コメント行を出力するプログラムをそのまま提出可能である。コメント行のうち、以下で始まるものはビジュアライザで特別扱いされる。

#v: 以下の形式で出力することで、H_k における頂点 i が G_{t_k} における頂点 p_i に対応すると予測したことをビジュアライザに伝えることが出来る。

#v p_0 \cdots p_{N-1}

#h: 提供されているローカルテスタを用いない場合に #h 100101 001101 のように出力することで、ビジュアライザが表示する H_k を置き換えることが出来る。左側が G_{s_k} にノイズを加えた後で、右側が更に頂点の順番をランダムに並び替えたものである。

Story

Takahashi, a genius inventor, invented the time machine called "Graphorean" (Graph + DeLorean) that can send a graph to the past. With this machine, he plans to get rich by playing casino roulette and sending the winning number information to the time before he plays. If he succeeds, he will move to the world line where he has successfully chosen the winning number and become very rich.

Because the machine cannot send the winning number directly, he needs to first convert the number into a graph (encoding), then send it, and finally convert the graph back into a number (decoding). Sending a graph by the machine loses the vertex number information and introduces noise, so he must develop encoding/decoding methods so that the number can be correctly restored. In order to receive a graph with N vertices, he must set an integer N to the machine. Therefore, the number of vertices of the graph to be sent must be determined in advance.

The time machine will be broken once he uses it, so failure will not be tolerated. Therefore, he decided to estimate the success probability by conducting simulations in advance and preparing an encoding/decoding method with as high a success probability as possible. Furthermore, because sending a large graph requires a huge amount of energy, it is desirable that the graph's size be as small as possible. Please help him.

Problem Statement

Given an integer M and an error rate \epsilon, determine an integer N satisfying 4\leq N\leq 100 and output N-vertex undirected graphs G_0,G_1,\cdots,G_{M-1}. The graphs may be disconnected. Then process the following query 100 times.

In the k-th query, you are given an N-vertex undirected graph H_k. H_k is a graph generated from some G_{s_k} as follows.

Initialize H_k=G_{s_k}.
For each (i,j) with 0\leq i<j\leq N-1, flip whether or not H_k contains edge (i,j) with probability \epsilon.
Randomly shuffle the order of the vertices in H_k.

After receiving H_k, predict from which graph G_{s_k} it was generated, and output the predicted value t_k of s_k.

Scoring

Let E be the number of failed predictions. Then the score for the test case is

\[ \mathrm{round}\left(10^9\times \frac{0.9^E}{N}\right) \]

Number of test cases

Provisional test: 50
System test: 2000. We will publish seeds.txt (sha256=4093b6cb740beea16eb0ecf55120ca6ca6fbef18015ea4a863e64d0bea3de91d) after the contest is over.
System test contains at most one test case for each pair of (M,\epsilon).

For each test case, we compute the relative score \mathrm{round}(10^9\times \frac{\mathrm{YOUR}}{\mathrm{MAX}}), where YOUR is your score and MAX is the highest score among all competitors obtained on that test case. The score of the submission is the sum of the relative scores.

The final ranking will be determined by the system test with more inputs which will be run after the contest is over. In both the provisional/system test, if your submission produces illegal output or exceeds the time limit for some test cases, only the score for those test cases will be zero. The system test will be performed only for the last submission which received a result other than CE . Be careful not to make a mistake in the final submission.

About Relative Evaluation System

In both the provisional/system test, the standings will be calculated using only the last submission which received a result other than CE. Only the last submissions are used to calculate the highest score among all competitors for each test case in calculating the relative scores.

The scores shown in the standings are relative, and whenever a new submission arrives, all relative scores are recalculated. On the other hand, the score for each submission shown on the submissions page is an absolute score obtained by summing up the scores for each test case, and the relative scores are not shown. In order to know the relative score of submission other than the latest one in the current standings, you need to resubmit it. (Update) If your submission produces illegal output or exceeds the time limit for some test cases, the absolute score shown in the submissions page becomes 0, but the standings show the sum of the relative scores for the test cases that were answered correctly.

About execution time

Execution time may vary slightly from run to run. In addition, since system tests simultaneously perform a large number of executions, it has been observed that execution time increases by several percent compared to provisional tests. For these reasons, submissions that are very close to the time limit may result in TLE in the system test. Please measure the execution time in your program to terminate the process, or have enough margin in the execution time.

Input and Output

First, information about the problem setting is given from Standard Input in the following format.

M \epsilon

M is an integer representing the number of graphs, satisfying 10\leq M\leq 100.
\epsilon is a real number representing the error rate, which is a multiple of 0.01 and satisfies 0.00\leq \epsilon\leq 0.4.

After reading the input, output M graphs G_0,G_1,\cdots,G_{M-1} to Standard Output in the following format.

N
g_0
\vdots
g_{M-1}

N is an integer representing the number of vertices in each graph, which must satisfy 4\leq N\leq 100.
Each g_k is a string of length N(N-1)/2, which represents the k-th graph G_k as follows. For each (i,j) satisfying 0\leq i<j\leq N-1, express the existence of edge (i,j) as 1 if G_k contains edge (i,j) and 0 if it does not, using one character, and then arrange them in lexicographic order of (i,j). For example, when N=4, the string 100101 represents a graph with 4 vertices connected on a straight line.

After output, you have to flush Standard Output. Otherwise, the submission might be judged as TLE. After outputting M graphs, repeat the following process 100 times.

In the k-th process, you are given an N-vertex graph H_k represented as a string of N(N-1)/2 01 characters in the same format as above, in a single line from Standard Input. After receiving H_k, predict from which graph G_{s_k} H_k is generated and output the prediction t_k (0\leq t_k\leq M-1) of s_k to Standard Output. The output must be followed by a new line, and you have to flush Standard Output. Note that H_{k+1} will not be given until you output the t_k.

Sample Solution

This is a sample solution in Python. In this program, we set N=20, and each graph G_k contains k edges. For each H_k, we count the number of edges m and output \min(m, M-1).

M, eps = input().split()
M = int(M)
eps = float(eps)
print(20)
for k in range(M):
    print("1" * k + "0" * (190 - k))

for q in range(100):
    H = input()
    t = min(H.count('1'), M - 1)
    print(t)

Input Generation

Let \mathrm{rand}(L,U) be a function that generates a uniform random integer between L and U, inclusive. M is generated by \mathrm{rand}(10,100). \epsilon is generated by 0.01\times \mathrm{rand}(0,40). Each s_k is generated by \mathrm{rand}(0,M-1).

Tools (Input generator and visualizer)

Web version: You can see the visualization of each input/output graph.
Local version: You need a compilation environment of Rust language.
- Pre-compiled binary for Windows: If you are not familiar with the Rust language environment, please use this instead.

Please be aware that sharing visualization results or discussing solutions/ideas during the contest is prohibited.

Specification of input/output files used by the tools

Input files given to the local tester have the following format.

M \epsilon
s_0
\vdots
s_{99}
\mathrm{seed}

The last \mathrm{seed} is a random seed value used for noise generation. Since each graph H_0,\cdots,H_{99} depends on output graphs G_0,\cdots,G_{M-1}, the input file contains only the random seed value.

The local tester writes outputs from your program directly to the output file. Your program may output comment lines starting with #. The web version of the visualizer displays the comment lines with the corresponding query, which may be useful for debugging and analysis. Since the judge program ignores all comment lines, you can submit a program that outputs comment lines as is.

Comment lines that begin with the following have special meaning in the visualizer.

#v: You can tell the visualizer that you have predicted that vertex i in H_k corresponds to vertex p_i in G_{t_k} by outputting it in the following format.

#v p_0 \cdots p_{N-1}

#h: If you do not use the provided local tester, you can replace the H_k displayed by the visualizer by outputting in the form #h 100101 001101. The left is a graph after adding noise to G_{s_k}, and the right is a graph after shuffling the vertex ordering.

A-Final - Graphorean (System Test)

Time Limit: 0 msec / Memory Limit: 1024 MiB