UniZero: Generalized and Efficient Planning with Scalable Latent World Models
Pu, Yuan, Niu, Yazhe, Ren, Jiyuan, Yang, Zhenjie, Li, Hongsheng, Liu, Yu
–arXiv.org Artificial Intelligence
Learning predictive world models is essential for enhancing the planning capabilities of reinforcement learning agents. Notably, the MuZero-style algorithms, based on the value equivalence principle and Monte Carlo Tree Search (MCTS), have achieved superhuman performance in various domains. However, in environments that require capturing long-term dependencies, MuZero's performance deteriorates rapidly. We identify that this is partially due to the \textit{entanglement} of latent representations with historical information, which results in incompatibility with the auxiliary self-supervised state regularization. To overcome this limitation, we present \textit{UniZero}, a novel approach that \textit{disentangles} latent states from implicit latent history using a transformer-based latent world model. By concurrently predicting latent dynamics and decision-oriented quantities conditioned on the learned latent history, UniZero enables joint optimization of the long-horizon world model and policy, facilitating broader and more efficient planning in latent space. We demonstrate that UniZero, even with single-frame inputs, matches or surpasses the performance of MuZero-style algorithms on the Atari 100k benchmark. Furthermore, it significantly outperforms prior baselines in benchmarks that require long-term memory. Lastly, we validate the effectiveness and scalability of our design choices through extensive ablation studies, visual analyses, and multi-task learning results. The code is available at \textcolor{magenta}{https://github.com/opendilab/LightZero}.
arXiv.org Artificial Intelligence
Jun-15-2024
- Country:
- North America > United States (0.28)
- Genre:
- Research Report > New Finding (0.68)
- Industry:
- Education (0.67)
- Leisure & Entertainment > Games
- Computer Games (0.47)
- Technology: