Bard, Nolan, Foerster, Jakob N., Chandar, Sarath, Burch, Neil, Lanctot, Marc, Song, H. Francis, Parisotto, Emilio, Dumoulin, Vincent, Moitra, Subhodeep, Hughes, Edward, Dunning, Iain, Mourad, Shibl, Larochelle, Hugo, Bellemare, Marc G., Bowling, Michael
From the early days of computing, games have been important testbeds for studying how well machines can do sophisticated decision making. In recent years, machine learning has made dramatic advances with artificial agents reaching superhuman performance in challenge domains like Go, Atari, and some variants of poker. As with their predecessors of chess, checkers, and backgammon, these game domains have driven research by providing sophisticated yet well-defined challenges for artificial intelligence practitioners. We continue this tradition by proposing the game of Hanabi as a new challenge domain with novel problems that arise from its combination of purely cooperative gameplay and imperfect information in a two to five player setting. In particular, we argue that Hanabi elevates reasoning about the beliefs and intentions of other agents to the foreground. We believe developing novel techniques capable of imbuing artificial agents with such theory of mind will not only be crucial for their success in Hanabi, but also in broader collaborative efforts, and especially those with human partners. To facilitate future research, we introduce the open-source Hanabi Learning Environment, propose an experimental framework for the research community to evaluate algorithmic advances, and assess the performance of current state-of-the-art techniques.
In Shannon's time, it would have seemed Around this time, Arthur Samuel began work the capabilities of computational intelligence. By 1958, Alan Newell and Herb Simon the game world with the real world--the game had begun their investigations into chess, of life--where the rules often change, the which eventually led to fundamental results scope of the problem is almost limitless, and for AI and cognitive science (Newell, Shaw, and the participants interact in an infinite number Simon 1958). An impressive lineup to say the of ways. Games can be a microcosm of the real least! Indeed, one of the early goals of AI was to and chess programs could play at a build a program capable of defeating the level comparable to the human world champion. This These remarkable accomplishments are the challenge proved to be more difficult than was result of a better understanding of the anticipated; the AI literature is replete with problems being solved, major algorithmic optimistic predictions. It eventually took insights, and tremendous advances in hardware almost 50 years to complete the task--a technology. The work on computer remarkably short time when one considers the games has been one of the most successful and software and hardware advances needed to visible results of AI research. The results are truly of the progress in building a world-class amazing. Even though there is an exponential program for the game is given, along with a difference between the best case and the brief description of the strongest program. The histories are necessarily case (Plaat et al. 1996). Games reports the past successes where computers realizing the lineage of the ideas.
We introduce OLIVAW, an AI Othello player adopting the design principles of the famous AlphaGo series. The main motivation behind OLIVAW was to attain exceptional competence in a non-trivial board game, but at a tiny fraction of the cost of its illustrious predecessors. In this paper we show how OLIVAW successfully met this challenge.
In this paper, several techniques for learning game state evaluation functions by reinforcement are proposed. The first is a generalization of tree bootstrapping (tree learning): it is adapted to the context of reinforcement learning without knowledge based on non-linear functions. With this technique, no information is lost during the reinforcement learning process. The second is a modification of minimax with unbounded depth extending the best sequences of actions to the terminal states. This modified search is intended to be used during the learning process. The third is to replace the classic gain of a game (+1 / -1) with a reinforcement heuristic. We study particular reinforcement heuristics such as: quick wins and slow defeats ; scoring ; mobility or presence. The four is another variant of unbounded minimax, which plays the safest action instead of playing the best action. This modified search is intended to be used after the learning process. The five is a new action selection distribution. The conducted experiments suggest that these techniques improve the level of play. Finally, we apply these different techniques to design program-players to the game of Hex (size 11 and 13) surpassing the level of Mohex 2.0 with reinforcement learning from self-play without knowledge. At Hex size 11 (without swap), the program-player reaches the level of Mohex 3HNN.
The game of Chinese Checkers is a challenging traditional board game of perfect information that differs from other traditional games in two main aspects: first, unlike Chess, all checkers remain indefinitely in the game and hence the branching factor of the search tree does not decrease as the game progresses; second, unlike Go, there are also no upper bounds on the depth of the search tree since repetitions and backward movements are allowed. Therefore, even in a restricted game instance, the state-space of the game can still be unbounded, making it challenging for a computer program to excel. In this work, we present an approach that effectively combines the use of heuristics, Monte Carlo tree search, and deep reinforcement learning for building a Chinese Checkers agent without the use of any human game-play data. Experiment results show that our agent is competent under different scenarios and reaches the level of experienced human players.