Muesli: Combining Improvements in Policy Optimization
Hessel, Matteo, Danihelka, Ivo, Viola, Fabio, Guez, Arthur, Schmitt, Simon, Sifre, Laurent, Weber, Theophane, Silver, David, van Hasselt, Hado
–arXiv.org Artificial Intelligence
We propose a novel policy update that combines regularized policy optimization with model learning as an auxiliary loss. The update (henceforth Muesli) matches MuZero's state-of-the-art performance on Atari. Notably, Muesli does so without using deep search: it acts directly with a policy network and has computation speed comparable to model-free baselines. The Atari results are complemented by extensive ablations, and by additional results on continuous control and 9x9 Go.
arXiv.org Artificial Intelligence
Apr-13-2021
- Country:
- Asia (0.14)
- Europe > United Kingdom
- England (0.14)
- North America > United States (0.14)
- Genre:
- Research Report (1.00)
- Industry:
- Leisure & Entertainment > Games > Computer Games (0.30)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning
- Learning Graphical Models > Undirected Networks
- Markov Models (0.67)
- Neural Networks > Deep Learning (1.00)
- Reinforcement Learning (1.00)
- Learning Graphical Models > Undirected Networks
- Representation & Reasoning
- Agents (0.67)
- Optimization (0.93)
- Search (0.68)
- Uncertainty (0.68)
- Machine Learning
- Information Technology > Artificial Intelligence