Collaborating Authors

Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model Machine Learning

Planning algorithms based on lookahead search have achieved remarkable successes in artificial intelligence. Human world champions have been defeated in classic games such as checkers [34], chess [5], Go [38] and poker [3, 26], and planning algorithms have had real-world impact in applications from logistics [47] to chemical synthesis [37]. However, these planning algorithms all rely on knowledge of the environment's dynamics, such as the rules of the game or an accurate simulator, preventing their direct application to real-world domains like robotics, industrial control, or intelligent assistants. Model-based reinforcement learning (RL) [42] aims to address this issue by first learning a model of the environment's dynamics, and then planning with respect to the learned model. Typically, these models have either focused on reconstructing the true environmental state [8, 16, 24], or the sequence of full observations [14, 20]. However, prior work [4, 14, 20] remains far from the state of the art in visually rich domains, such as Atari 2600 games [2]. Instead, the most successful methods are based on model-free RL [9, 21, 18] - i.e. they estimate the optimal policy and/or value function directly from interactions with the environment. However, model-free algorithms are in turn far from the state of the art in domains that require precise and sophisticated lookahead, such as chess and Go. In this paper, we introduce MuZero, a new approach to model-based RL that achieves state-of-the-art performance in Atari 2600, a visually complex set of domains, while maintaining superhuman performance in precision planning tasks such as chess, shogi and Go.

DeepMind's latest AI can master games without being told their rules


In 2016, Alphabet's DeepMind came out with AlphaGo, an AI which consistently beat the best human Go players. One year later, the subsidiary went on to refine its work, creating AlphaGo Zero. Where its predecessor learned to play Go by observing amateur and professional matches, AlphaGo Zero mastered the ancient game by simply playing against itself. DeepMind then created AlphaZero, which could play Go, chess and shogi with a single algorithm. What tied all those AIs together is that they knew the rules of the games they had to master going into their training.

DeepMind gets good at games (and choosing them) – plus more bits and bytes from the world of machine learning


Roundup If you can't get enough of machine learning news then here's a roundup of extra tidbits to keep your addiction ticking away. Read on to learn more about how DeepMind is helping Google's Play Store, and a new virtual environment to train agents safely from OpenAI. An AI recommendation system for the Google Play Store: Deepmind are helping Android users find new apps in the Google Play Store with the help of machine learning. "We started collaborating with the Play store to help develop and improve systems that determine the relevance of an app with respect to the user," the London-based lab said this week. Engineers built a model known as a candidate generator.

How To Build Your Own MuZero AI Using Python (Part 1/3)


If you want to learn how one of the most sophisticated AI systems ever built works, you've come to the right place. In this three part series, we'll explore the inner workings of the DeepMind MuZero model -- the younger (and even more impressive) brother of AlphaZero. We'll be walking through the pseudocode that accompanies the MuZero paper -- so grab yourself a cup of tea and a comfy chair and let's begin. On 19th November 2019 DeepMind released their latest model-based reinforcement learning algorithm to the world -- MuZero. This is the fourth in a line of DeepMind reinforcement learning papers that have continually smashed through the barriers of possibility, starting with AlphaGo in 2016.

Monte-Carlo Tree Search as Regularized Policy Optimization Machine Learning

The combination of Monte-Carlo tree search (MCTS) with deep reinforcement learning has led to significant advances in artificial intelligence. However, AlphaZero, the current state-of-the-art MCTS algorithm, still relies on handcrafted heuristics that are only partially understood. In this paper, we show that AlphaZero's search heuristics, along with other common ones such as UCT, are an approximation to the solution of a specific regularized policy optimization problem. With this insight, we propose a variant of AlphaZero which uses the exact solution to this policy optimization problem, and show experimentally that it reliably outperforms the original algorithm in multiple domains.