e8258e5140317ff36c7f8225a3bf9590-Supplemental.pdf
–Neural Information Processing Systems
The original MuZero did not use sticky actions (Machado et al., 2017) (a 25% chance that the selected action is ignored and that instead the previous action is repeated) for Atari experiments. For all experiments in this work we used a network architecture based on the one introduced by MuZero(Schrittwieser etal.,2020), To implement the network, we used the modules provided by the Haiku neural network library (Henniganetal.,2020). We did not observe any benefit from using a Gaussian mixture, so instead inallourexperiments weusedasingle Gaussian withdiagonal covariance. All experiments used the Adam optimiser (Kingma & Ba, 2015) with decoupled weight decay (Loshchilov & Hutter, 2017) for training.
Neural Information Processing Systems
Feb-11-2026, 16:41:22 GMT