Schrittwieser, Julian, Antonoglou, Ioannis, Hubert, Thomas, Simonyan, Karen, Sifre, Laurent, Schmitt, Simon, Guez, Arthur, Lockhart, Edward, Hassabis, Demis, Graepel, Thore, Lillicrap, Timothy, Silver, David
Planning algorithms based on lookahead search have achieved remarkable successes in artificial intelligence. Human world champions have been defeated in classic games such as checkers , chess , Go  and poker [3, 26], and planning algorithms have had real-world impact in applications from logistics  to chemical synthesis . However, these planning algorithms all rely on knowledge of the environment's dynamics, such as the rules of the game or an accurate simulator, preventing their direct application to real-world domains like robotics, industrial control, or intelligent assistants. Model-based reinforcement learning (RL)  aims to address this issue by first learning a model of the environment's dynamics, and then planning with respect to the learned model. Typically, these models have either focused on reconstructing the true environmental state [8, 16, 24], or the sequence of full observations [14, 20]. However, prior work [4, 14, 20] remains far from the state of the art in visually rich domains, such as Atari 2600 games . Instead, the most successful methods are based on model-free RL [9, 21, 18] - i.e. they estimate the optimal policy and/or value function directly from interactions with the environment. However, model-free algorithms are in turn far from the state of the art in domains that require precise and sophisticated lookahead, such as chess and Go. In this paper, we introduce MuZero, a new approach to model-based RL that achieves state-of-the-art performance in Atari 2600, a visually complex set of domains, while maintaining superhuman performance in precision planning tasks such as chess, shogi and Go.
If you want to learn how one of the most sophisticated AI systems ever built works, you've come to the right place. In this three part series, we'll explore the inner workings of the DeepMind MuZero model -- the younger (and even more impressive) brother of AlphaZero. We'll be walking through the pseudocode that accompanies the MuZero paper -- so grab yourself a cup of tea and a comfy chair and let's begin. On 19th November 2019 DeepMind released their latest model-based reinforcement learning algorithm to the world -- MuZero. This is the fourth in a line of DeepMind reinforcement learning papers that have continually smashed through the barriers of possibility, starting with AlphaGo in 2016.
Roundup If you can't get enough of machine learning news then here's a roundup of extra tidbits to keep your addiction ticking away. Read on to learn more about how DeepMind is helping Google's Play Store, and a new virtual environment to train agents safely from OpenAI. An AI recommendation system for the Google Play Store: Deepmind are helping Android users find new apps in the Google Play Store with the help of machine learning. "We started collaborating with the Play store to help develop and improve systems that determine the relevance of an app with respect to the user," the London-based lab said this week. Engineers built a model known as a candidate generator.
Every week, my team at Invector Labs publishes a newsletter that covers the most recent developments in AI research and technology. You can find this week's issue below. You can sign up for it below. Gaming is one of the areas in which AI has shown the most progress in the last few years. From checker to Go to StarCraft, AI programs has regularly achieved superhuman performance and shown signs of creativity.
In a preprint paper published this week by DeepMind, Google parent company Alphabet's U.K.-based research division, a team of scientists describe Agent57, which they say is the first system that outperforms humans on all 57 Atari games in the Arcade Learning Environment data set. Assuming the claim holds water, Agent57 could lay the groundwork for more capable AI decision-making models than have been previously released. This could be a boon for enterprises looking to boost productivity through workplace automation; imagine AI that automatically completes not only mundane, repetitive tasks like data entry, but which reasons about its environment. "With Agent57, we have succeeded in building a more generally intelligent agent that has above-human performance on all tasks in the Atari57 benchmark," wrote the study's coauthors. "Agent57 was able to scale with increasing amounts of computation: the longer it trained, the higher its score got."