A Appendix

May-29-2025, 19:29:04 GMT–Neural Information Processing Systems

Training As for the training phase, we train the model from scratch without any human expert data, which is the same as the setting of Atari games. Besides, limited to the GPU resources, we do not use the reanalyzing mechanism of MuZero [27] and EfficientZero [34], which targets at recalculation of the target values and policies from trajectories in the replay buffer with the current fresher model. Specifically, we use 6 GPUs for doing self-play to collect data, 1 GPU for training, and 1 GPU for evaluation. Exploration To make a better exploration on Go, we reduce the in the Dirichlet noise Dir() from 0.3 to 0.03, and we scale the exploration noise through the typical number of legal actions, which follows these works [31, 27].

artificial intelligence, expansion, machine learning, (14 more...)

Neural Information Processing Systems

May-29-2025, 19:29:04 GMT

Conferences PDF

Add feedback

Industry:
- Leisure & Entertainment > Games > Computer Games (0.60)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)