Goto

Collaborating Authors

 avg











GameSolvingwithOnlineFine-Tuning

Neural Information Processing Systems

A.1 PCNtraining We basically follow the same PCN training method by Wu et al.[1] but replace the AlphaZero algorithm with the Gumbel AlphaZero algorithm [2], where the simulation count is set to 322 in self-play and starts by sampling 16 actions. The architecture of the PCN contains three residual blocks with 256 hidden channels. Atotal of400,000 self-play games are generated for the whole training. During optimization, the learning rate is fixed at 0.02, and the batch size is set to 1,024. A.3 Workerdesign The worker is itself a Killall-Go solver. Thus,tofullyutilize GPU resources, we implement batch GPU inferencing to accelerate PCN evaluations for workers.