Collaborating Authors


Reinforcement Learning in FlipIt Artificial Intelligence

Reinforcement learning has shown much success in games such as chess, backgammon and Go [1, 2, 3]. However, in most of these games, agents have full knowledge of the environment at all times. In this paper, we describe a deep learning model that successfully optimizes its score using reinforcement learning in a game with incomplete and imperfect information. We apply our model to FlipIt [4], a two-player game in which both players, the attacker and the defender, compete for ownership of a shared resource and only receive information on the current state (such as the current owner of the resource, or the time since the opponent last moved, etc.) upon making a move. Our model is a deep neural network combined with Q-learning and is trained to maximize the defender's time of ownership of the resource. Despite the imperfect observations, our model successfully learns an optimal cost-effective counter-strategy and shows the advantages of the use of deep reinforcement learning in game theoretic scenarios. Our results show that it outperforms the Greedy strategy against distributions such as periodic and exponential distributions without any prior knowledge of the opponent's strategy, and we generalize the model to n-player games.

Standing on the shoulders of giants


When you think of AI or machine learning you may draw up images of AlphaZero or even some science fiction reference such as HAL-9000 from 2001: A Space Odyssey. However, the true forefather, who set the stage for all of this, was the great Arthur Samuel. Samuel was a computer scientist, visionary, and pioneer, who wrote the first checkers program for the IBM 701 in the early 1950s. His program, "Samuel's Checkers Program", was first shown to the general public on TV on February 24th, 1956, and the impact was so powerful that IBM stock went up 15 points overnight (a huge jump at that time). This program also helped set the stage for all the modern chess programs we have come to know so well, with features like look-ahead, an evaluation function, and a mini-max search that he would later develop into alpha-beta pruning.

Autocurricula and the Emergence of Innovation from Social Interaction: A Manifesto for Multi-Agent Intelligence Research Artificial Intelligence

Evolution has produced a multi-scale mosaic of interacting adaptive units. Innovations arise when perturbations push parts of the system away from stable equilibria into new regimes where previously well-adapted solutions no longer work. Here we explore the hypothesis that multi-agent systems sometimes display intrinsic dynamics arising from competition and cooperation that provide a naturally emergent curriculum, which we term an autocurriculum. The solution of one social task often begets new social tasks, continually generating novel challenges, and thereby promoting innovation. Under certain conditions these challenges may become increasingly complex over time, demanding that agents accumulate ever more innovations.

The Hanabi Challenge: A New Frontier for AI Research Machine Learning

From the early days of computing, games have been important testbeds for studying how well machines can do sophisticated decision making. In recent years, machine learning has made dramatic advances with artificial agents reaching superhuman performance in challenge domains like Go, Atari, and some variants of poker. As with their predecessors of chess, checkers, and backgammon, these game domains have driven research by providing sophisticated yet well-defined challenges for artificial intelligence practitioners. We continue this tradition by proposing the game of Hanabi as a new challenge domain with novel problems that arise from its combination of purely cooperative gameplay and imperfect information in a two to five player setting. In particular, we argue that Hanabi elevates reasoning about the beliefs and intentions of other agents to the foreground. We believe developing novel techniques capable of imbuing artificial agents with such theory of mind will not only be crucial for their success in Hanabi, but also in broader collaborative efforts, and especially those with human partners. To facilitate future research, we introduce the open-source Hanabi Learning Environment, propose an experimental framework for the research community to evaluate algorithmic advances, and assess the performance of current state-of-the-art techniques.

Ranked Reward: Enabling Self-Play Reinforcement Learning for Combinatorial Optimization Machine Learning

Adversarial self-play in two-player games has delivered impressive results when used with reinforcement learning algorithms that combine deep neural networks and tree search. Algorithms like AlphaZero and Expert Iteration learn tabula-rasa, producing highly informative training data on the fly. However, the self-play training strategy is not directly applicable to single-player games. Recently, several practically important combinatorial optimization problems, such as the traveling salesman problem and the bin packing problem, have been reformulated as reinforcement learning problems, increasing the importance of enabling the benefits of self-play beyond two-player games. We present the Ranked Reward (R2) algorithm which accomplishes this by ranking the rewards obtained by a single agent over multiple games to create a relative performance metric. Results from applying the R2 algorithm to instances of a two-dimensional bin packing problem show that it outperforms generic Monte Carlo tree search, heuristic algorithms and reinforcement learning algorithms not using ranked rewards.

Race for the Galaxy AI


What makes a game replayable over time? It offers new challenges over and over again. One way to do that is to include an AI opponent that is so skilled, even advanced players will continue to be challenged after hundreds of hours of play. Race has been one of the top selling boardgames this year partly because of the neural network that powers its AI. Race for the Galaxy uses a temporal difference neural network.

Building an AI that Can Beat You at Your Own Game – Towards Data Science


The full instructions are here, and a sample game is here. AIs are now better than humans at Backgammon, Checkers, Chess, Othello, and Go. See Audrey Keurenkov's A'Brief' History of Game AI Up to AlphaGo for a more in-depth timeline. In 2017, Michael Tucker, Nikhil Prabala, and I set out to create PAI, the world's first AI for Pathwayz. The AIs for Othello and Backgammon were especially relevant to our development of PAI. Othello, like Pathwayz, is a relatively young game -- at least compared to the ancient Backgammon, Checkers, Chess, and Go.

Belief Reward Shaping in Reinforcement Learning

AAAI Conferences

A key challenge in many reinforcement learning problems is delayed rewards, which can significantly slow down learning. Although reward shaping has previously been introduced to accelerate learning by bootstrapping an agent with additional information, this can lead to problems with convergence. We present a novel Bayesian reward shaping framework that augments the reward distribution with prior beliefs that decay with experience. Formally, we prove that under suitable conditions a Markov decision process augmented with our framework is consistent with the optimal policy of the original MDP when using the Q-learning algorithm. However, in general our method integrates seamlessly with any reinforcement learning algorithm that learns a value or action-value function through experience. Experiments are run on a gridworld and a more complex backgammon domain that show that we can learn tasks significantly faster when we specify intuitive priors on the reward distribution.

What You Need to Know About Machine Learning - Part 2 - Phrasee


Note: If you have already read part 1 of this series, you are already well on your way to becoming a machine learning expert. If not, you should read it now. When considering machine learning as a concept, it is important to remember that it is a complex field. One that's rife with categories and subcategories, with yet more subcategories being added by the day. To delve too deeply into all of these would be to curse you, dear reader, to several torturous hours of maths and more maths until you would simply give up and decide to watch YouTube videos about X-rays of objects found in people's butts.

A Conversation with Christos Papadimitriou

AITopics Original Links

Christos Papadimitriou, the C. Lester Hogan Professor of Electrical Engineering and Computer Science at the University of California at Berkeley, is this year's recipient of the Katyanagi Prize for Research Excellence. Carnegie Mellon University has cited Dr. Papadimitriou as "an internationally recognized expert on the theory of algorithms and complexity, and its applications to databases, optimization, artificial intelligence, networks and game theory." We recently spoke with Papadimitriou, where among other topics we delved into the underpinnings of science, the economics of the programming market, the mysterious complexity of the Web, quantum computing, and the computer scientist as popular novelist. Next month, we talk with Dr. Erik Demaine, recipient of this year's Katyanagi Emerging Leadership Prize. CP: I didn't know I had been nominated. She mentioned the previous winner, so I thought someone else won the prize and that I was invited to speak at the ceremony. I replied, "Yeah, okay, let me think about it, give me a week..." She wrote back in astonishment, thinking I was not accepting the prize!