Artificial intelligence (AI) has become a part of everyday conversation and our lives. It is considered as the new electricity that is revolutionizing the world. AI is heavily invested in both industry and academy. However, there is also a lot of hype in the current AI debate. AI based on so-called deep learning has achieved impressive results in many problems, but its limits are already visible. AI has been under research since the 1940s, and the industry has seen many ups and downs due to over-expectations and related disappointments that have followed. The purpose of this book is to give a realistic picture of AI, its history, its potential and limitations. We believe that AI is a helper, not a ruler of humans. We begin by describing what AI is and how it has evolved over the decades. After fundamentals, we explain the importance of massive data for the current mainstream of artificial intelligence. The most common representations for AI, methods, and machine learning are covered. In addition, the main application areas are introduced. Computer vision has been central to the development of AI. The book provides a general introduction to computer vision, and includes an exposure to the results and applications of our own research. Emotions are central to human intelligence, but little use has been made in AI. We present the basics of emotional intelligence and our own research on the topic. We discuss super-intelligence that transcends human understanding, explaining why such achievement seems impossible on the basis of present knowledge,and how AI could be improved. Finally, a summary is made of the current state of AI and what to do in the future. In the appendix, we look at the development of AI education, especially from the perspective of contents at our own university.
We consider the task of building strong but human-like policies in multi-agent decision-making problems, given examples of human behavior. Imitation learning is effective at predicting human actions but may not match the strength of expert humans, while self-play learning and search techniques (e.g. AlphaZero) lead to strong performance but may produce policies that are difficult for humans to understand and coordinate with. We show in chess and Go that regularizing search policies based on the KL divergence from an imitation-learned policy by applying Monte Carlo tree search produces policies that have higher human prediction accuracy and are stronger than the imitation policy. We then introduce a novel regret minimization algorithm that is regularized based on the KL divergence from an imitation-learned policy, and show that applying this algorithm to no-press Diplomacy yields a policy that maintains the same human prediction accuracy as imitation learning while being substantially stronger.
Schmid, Martin, Moravcik, Matej, Burch, Neil, Kadlec, Rudolf, Davidson, Josh, Waugh, Kevin, Bard, Nolan, Timbers, Finbarr, Lanctot, Marc, Holland, Zach, Davoodi, Elnaz, Christianson, Alden, Bowling, Michael
Games have a long history of serving as a benchmark for progress in artificial intelligence. Recently, approaches using search and learning have shown strong performance across a set of perfect information games, and approaches using game-theoretic reasoning and learning have shown strong performance for specific imperfect information poker variants. We introduce Player of Games, a general-purpose algorithm that unifies previous approaches, combining guided search, self-play learning, and game-theoretic reasoning. Player of Games is the first algorithm to achieve strong empirical performance in large perfect and imperfect information games -- an important step towards truly general algorithms for arbitrary environments. We prove that Player of Games is sound, converging to perfect play as available computation time and approximation capacity increases. Player of Games reaches strong performance in chess and Go, beats the strongest openly available agent in heads-up no-limit Texas hold'em poker (Slumbot), and defeats the state-of-the-art agent in Scotland Yard, an imperfect information game that illustrates the value of guided search, learning, and game-theoretic reasoning.
From the very dawn of the field, search with value functions was a fundamental concept of computer games research. Turing's chess algorithm from 1950 was able to think two moves ahead, and Shannon's work on chess from $1950$ includes an extensive section on evaluation functions to be used within a search. Samuel's checkers program from 1959 already combines search and value functions that are learned through self-play and bootstrapping. TD-Gammon improves upon those ideas and uses neural networks to learn those complex value functions -- only to be again used within search. The combination of decision-time search and value functions has been present in the remarkable milestones where computers bested their human counterparts in long standing challenging games -- DeepBlue for Chess and AlphaGo for Go. Until recently, this powerful framework of search aided with (learned) value functions has been limited to perfect information games. As many interesting problems do not provide the agent perfect information of the environment, this was an unfortunate limitation. This thesis introduces the reader to sound search for imperfect information games.
Endgame studies have long served as a tool for testing human creativity and intelligence. We find that they can serve as a tool for testing machine ability as well. Two of the leading chess engines, Stockfish and Leela Chess Zero (LCZero), employ significantly different methods during play. We use Plaskett's Puzzle, a famous endgame study from the late 1970s, to compare the two engines. Our experiments show that Stockfish outperforms LCZero on the puzzle. We examine the algorithmic differences between the engines and use our observations as a basis for carefully interpreting the test results. Drawing inspiration from how humans solve chess problems, we ask whether machines can possess a form of imagination. On the theoretical side, we describe how Bellman's equation may be applied to optimize the probability of winning. To conclude, we discuss the implications of our work on artificial intelligence (AI) and artificial general intelligence (AGI), suggesting possible avenues for future research.
Playing human games such as chess and Go has long been considered to be a major benchmark of human capabilities. Computer programs have become robust chess players and, since the late 1990s, have been able to beat even the best human chess champions; though, for a long time, computers were unable to beat expert Go players -- the game of Go has proven to be especially difficult for computers. However, in 2016, a new program called AlphaGo finally won a victory over a human Go champion, only to be beaten by its subsequent versions (AlphaGo Zero and AlphaZero). AlphaZero proceeded to beat the best computers and humans in chess, shogi and Go, including all its predecessors from the Alpha family . Core to AlphaZero's success is its use of a deep neural network, trained through reinforcement learning, as a powerful heuristic to guide a tree search algorithm (specifically Monte Carlo Tree Search). The recent successes of machine learning are good reason to consider the limitations of learning algorithms and, in a broader sense, the limitations of AI. In the context of a particular competition (or'game'), a natural question to ask is whether an absolute winner AI might exist -- one that, given sufficient resources, will always achieve the best possible outcome.
In this paper, several techniques for learning game state evaluation functions by reinforcement are proposed. The first is a generalization of tree bootstrapping (tree learning): it is adapted to the context of reinforcement learning without knowledge based on non-linear functions. With this technique, no information is lost during the reinforcement learning process. The second is a modification of minimax with unbounded depth extending the best sequences of actions to the terminal states. This modified search is intended to be used during the learning process. The third is to replace the classic gain of a game (+1 / -1) with a reinforcement heuristic. We study particular reinforcement heuristics such as: quick wins and slow defeats ; scoring ; mobility or presence. The four is another variant of unbounded minimax, which plays the safest action instead of playing the best action. This modified search is intended to be used after the learning process. The five is a new action selection distribution. The conducted experiments suggest that these techniques improve the level of play. Finally, we apply these different techniques to design program-players to the game of Hex (size 11 and 13) surpassing the level of Mohex 2.0 with reinforcement learning from self-play without knowledge. At Hex size 11 (without swap), the program-player reaches the level of Mohex 3HNN.
We have seen numerous machine learning methods tackle the game of chess over the years. However, one common element in these works is the necessity of a finely optimized look ahead algorithm. The particular interest of this research lies with creating a chess engine that is highly capable, but restricted in its look ahead depth. We train a deep neural network to serve as a static evaluation function, which is accompanied by a relatively simple look ahead algorithm. We show that our static evaluation function has encoded some semblance of look ahead knowledge, and is comparable to classical evaluation functions. The strength of our chess engine is assessed by comparing its proposed moves against those proposed by Stockfish. We show that, despite strict restrictions on look ahead depth, our engine recommends moves of equal strength in roughly $83\%$ of our sample positions.
Crazyhouse is a chess variant that incorporates all of the classical chess rules, but allows users to drop pieces captured from the opponent as a normal move. Until 2018, all competitive computer engines for this board game made use of an alpha-beta pruning algorithm with a hand-crafted evaluation function for each position. Previous machine learning-based algorithms for just regular chess, such as NeuroChess and Giraffe, took hand-crafted evaluation features as input rather than a raw board representation. More recent projects, such as AlphaZero, reached massive success but required massive computational resources in order to reach its final strength. This paper describes the development of SixtyFour, an engine designed to compete in the chess variant of Crazyhouse with limited hardware. This specific variant poses a multitude of significant challenges due to its large branching factor, state-space complexity, and the multiple move types a player can make. We propose the novel creation of a neural network-based evaluation function for Crazyhouse. More importantly, we evaluate the effectiveness of an ensemble model, which allows the training time and datasets to be easily distributed on regular CPU hardware commodity. Early versions of the network have attained a playing level comparable to a strong amateur on online servers.
We present SentiMATE, a novel end-to-end Deep Learning model for Chess, employing Natural Language Processing that aims to learn an effective evaluation function assessing move quality. This function is pre-trained on the sentiment of commentary associated with the training moves and is used to guide and optimize the agent's game-playing decision making. The contributions of this research are three-fold: we build and put forward both a classifier which extracts commentary describing the quality of Chess moves in vast commentary datasets, and a Sentiment Analysis model trained on Chess commentary to accurately predict the quality of said moves, to then use those predictions to evaluate the optimal next move of a Chess agent. Both classifiers achieve over 90 % classification accuracy. Lastly, we present a Chess engine, SentiMATE, which evaluates Chess moves based on a pre-trained sentiment evaluation function. Our results exhibit strong evidence to support our initial hypothesis - "Can Natural Language Processing be used to train a novel and sample efficient evaluation function in Chess Engines?" - as we integrate our evaluation function into modern Chess engines and play against agents with traditional Chess move evaluation functions, beating both random agents and a DeepChess implementation at a level-one search depth - representing the number of moves a traditional Chess agent (employing the alpha-beta search algorithm) looks ahead in order to evaluate a given chess state.