Online and Offline Reinforcement Learning by Planning with a Learned Model Julian Schrittwieser

Neural Information Processing Systems 

Reanalyse was briefly introduced in the context of MuZero (Schrittwieser et al., 2020), but limited to