A Appendix

Neural Information Processing Systems 

Training As for the training phase, we train the model from scratch without any human expert data, which is the same as the setting of Atari games. Besides, limited to the GPU resources, we do not use the reanalyzing mechanism of MuZero [27] and EfficientZero [34], which targets at recalculation of the target values and policies from trajectories in the replay buffer with the current fresher model. Specifically, we use 6 GPUs for doing self-play to collect data, 1 GPU for training, and 1 GPU for evaluation. Exploration To make a better exploration on Go, we reduce the in the Dirichlet noise Dir() from 0.3 to 0.03, and we scale the exploration noise through the typical number of legal actions, which follows these works [31, 27].