Eric B. Baum 1 NEC Research Institute, 4 Independence Way, Princeton NJ 08540 eric@research.NJ.NEC.COM Abstract The point of game tree search is to insulate oneself from errors in the evaluation function. The standard approach is to grow a full width tree as deep as time allows, and then value the tree as if the leaf evaluations were exact. This has been effective in many games because of the computational efficiency of the alpha-beta algorithm. A Bayesian would suggest instead to train a model of one's uncertainty. This model adds extra information in addition to the standard evaluation function. Within such a formal model, there is an optimal tree growth procedure and an optimal method of valueing the tree. We describe how to optimally value the tree, and how to approximate on line the optimal tree to search.
The best chess machines are competitive with the best humans, but generate millions of positions per move. Their human opponents, however, only examine tens of positions, but search much deeper along some lines of play. Obviously, people are more selective in their choice of positions to examine. The importance of selective search was first recognized by (Shannon 1950). Most work on game-tree search has focussed on algorithms that make the same decisions as fullwidth, fixed-depth minimax. This includes alpha-beta pruning (Knuth & Moore 1975), fixed and dynamic node ordering (Slagle & Dixon 1969), SSS* (Stockman 1979), Scout (Pearl 1984), aspiration-windows (Kaindl, Shams, & Horacek 1991), etc.
The values returned by evaluation functions in game-playing programs are generally uncertain. One can attempt to directly model this uncertainty by representing leaf node evaluations as ranges or distributions and devising methods to estimate and manipulate these quantities (Berliner 1979; Palay 1985; Russell and Wefald 1991; Berliner and McConnell 1995; Baum and Smith 1997). This can lead to principled ways to explore a game tree, both in where to look and how much effort to expend. In spite of the great intuitive appeal of these methods, the overwhelming majority of high-performance game-playing programs, Deep Blue included, use some form of the alpha-beta algorithm (Knuth and Moore 1975). This is part due to the simplicity, efficiency and empirical effectiveness of alpha-beta. One key feature of alpha-beta is the use of a scalar evaluation function, which allows no easy way to model evaluation uncertainty. Alpha-beta must therefore manage the uncertainty indirectly, through search. The general idea is to focus searching effort on parts of the Copyright 1999, American Association for Artificial Intelligence (www.aaai.org).