Kriesgpiel, or partially observable chess, is appealing to the AI community due to its similarity to real-world applications in which a decision maker is not a lone agent changing the environment. This paper applies the framework of Interactive POMDPs to design a competent Kriegspiel player. The novel element, compared to the existing approaches, is to model the opponent as a competent player and to predict his likely moves. The moves of our own player can then be computed based on these predictions. The problem is challenging because, first, there are many possible world states the agent has to keep track of.
Summary: I describe how the TrueSkill algorithm works using concepts you're already familiar with. TrueSkill is used on Xbox Live to rank and match players and it serves as a great way to understand how statistical machine learning is actually applied today. I've also created an open source project where I implemented TrueSkill three different times in increasing complexity and capability. In addition, I've created a detailed supplemental math paper that works out equations that I gloss over here. Feel free to jump to sections that look interesting and ignore ones that seem boring. Don't worry if this post seems a bit long, there are lots of pictures. It seemed easy enough: I wanted to create a database to track the skill levels of my coworkers in chess and foosball. I already knew that I wasn't very good at foosball and would bring down better players. I was curious if an algorithm could do a better job at creating well-balanced matches. I also wanted to see if I was improving at chess. I knew I needed to have an easy way to collect results from everyone and then use an algorithm that would keep getting better with more data. I was looking for a way to compress all that data and distill it down to some simple knowledge of how skilled people are. Based on some previous things that I had heard about, this seemed like a good fit for "machine learning." Machine learning is a hot area in Computer Science-- but it's intimidating. Like most subjects, there's a lot to learn to be an expert in the field. I didn't need to go very deep; I just needed to understand enough to solve my problem. I found a link to the paper describing the TrueSkill algorithm and I read it several times, but it didn't make sense. It was only 8 pages long, but it seemed beyond my capability to understand.
Nguyen, Truong-Huy Dinh (National University of Singapore) | Hsu, David (National University of Singapore) | Lee, Wee-Sun (National University of Singapore) | Leong, Tze-Yun (National University of Singapore) | Kaelbling, Leslie Pack (Massachusetts Institute of Technology) | Lozano-Perez, Tomas (Massachusetts Institute of Technology) | Grant, Andrew Haydn (Singapore-MIT GAMBIT Game Lab)
We apply decision theoretic techniques to construct non-player characters that are able to assist a human player in collaborative games. The method is based on solving Markov decision processes, which can be difficult when the game state is described by many variables. To scale to more complex games, the method allows decomposition of a game task into subtasks, each of which can be modelled by a Markov decision process. Intention recognition is used to infer the subtask that the human is currently performing, allowing the helper to assist the human in performing the correct task. Experiments show that the method can be effective, giving near-human level performance in helping a human in a collaborative game.
Eric B. Baum 1 NEC Research Institute, 4 Independence Way, Princeton NJ 08540 eric@research.NJ.NEC.COM Abstract The point of game tree search is to insulate oneself from errors in the evaluation function. The standard approach is to grow a full width tree as deep as time allows, and then value the tree as if the leaf evaluations were exact. This has been effective in many games because of the computational efficiency of the alpha-beta algorithm. A Bayesian would suggest instead to train a model of one's uncertainty. This model adds extra information in addition to the standard evaluation function. Within such a formal model, there is an optimal tree growth procedure and an optimal method of valueing the tree. We describe how to optimally value the tree, and how to approximate on line the optimal tree to search.
Now the whole point of search (as opposed to just picking whichever child looks best to an evaluation function) is to insulate oneself from errors in the evaluation function. When one searches below a node, one gains more information and one's opinion of the value of that node may change. Such "opinion changes" are inherently probabilistic. They occur because one's information or computational abilities are unable to distinguish different states, e.g. a node with a given set of features might have different values. In this paper we adopt a probabilistic model of opinion changes, de-1This is a super-abbreviated discussion of [Baum and Smith, 1993] written by EBB for this conference.