Muesli: Combining Improvements in Policy Optimization
Hessel, Matteo, Danihelka, Ivo, Viola, Fabio, Guez, Arthur, Schmitt, Simon, Sifre, Laurent, Weber, Theophane, Silver, David, van Hasselt, Hado
–arXiv.org Artificial Intelligence
We propose a novel policy update that combines regularized policy optimization with model learning as an auxiliary loss. The update (henceforth Muesli) matches MuZero's state-of-the-art performance on Atari. Notably, Muesli does so without using deep search: it acts directly with a policy network and has computation speed comparable to model-free baselines. The Atari results are complemented by extensive ablations, and by additional results on continuous control and 9x9 Go.
arXiv.org Artificial Intelligence
Apr-13-2021
- Country:
- North America > United States
- Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom
- England
- Greater London > London (0.04)
- Cambridgeshire > Cambridge (0.04)
- England
- Asia
- Middle East > Jordan (0.04)
- China (0.04)
- North America > United States
- Genre:
- Research Report (1.00)
- Industry:
- Leisure & Entertainment > Games > Computer Games (0.30)
- Technology:
- Information Technology > Artificial Intelligence
- Representation & Reasoning
- Optimization (0.93)
- Uncertainty (0.68)
- Search (0.68)
- Agents (0.67)
- Machine Learning
- Reinforcement Learning (1.00)
- Neural Networks > Deep Learning (1.00)
- Learning Graphical Models > Undirected Networks
- Markov Models (0.67)
- Representation & Reasoning
- Information Technology > Artificial Intelligence