Eric B. Baum 1 NEC Research Institute, 4 Independence Way, Princeton NJ 08540 eric@research.NJ.NEC.COM Abstract The point of game tree search is to insulate oneself from errors in the evaluation function. The standard approach is to grow a full width tree as deep as time allows, and then value the tree as if the leaf evaluations were exact. This has been effective in many games because of the computational efficiency of the alpha-beta algorithm. A Bayesian would suggest instead to train a model of one's uncertainty. This model adds extra information in addition to the standard evaluation function. Within such a formal model, there is an optimal tree growth procedure and an optimal method of valueing the tree. We describe how to optimally value the tree, and how to approximate on line the optimal tree to search.

Commentary on Baum's "How a Bayesian..? I. J. Good, for example, suggested that a computation is Stuart Russell, Computer Science Division, University of California, Berkeley, CA 94720. This rules out computations that might reveal one's plan to be a blunder--OK for politicians, but Summary of the Paper not for game-playing programs. The paper divides the problem of game playing into two Part of the difficulty lies in the formulation. P(A]B) parts: growing a search tree and evaluating the possible should be independent of the form of B--i.e., any logically moves on that basis. The evaluation process is based in equivalent expression should be treated the same way-- part on the idea that leaf node evaluations should be probability distributions rather than point values, and should forms.

Decision-theoretic control of search has previously used as its basic unit. of computation the generation and evaluation of a complete set of successors. Although this simplifies analysis, it results in some lost opportunities for pruning and satisficing. This paper therefore extends the analysis of the value of computation to cover individual successor evaluations. The analytic techniques used may prove useful for control of reasoning in more general settings. A formula is developed for the expected value of a node, k of whose n successors have been evaluated. This formula is used to estimate the value of expanding further successors, using a general formula for the value of a computation in game-playing developed in earlier work. We exhibit an improved version of the MGSS* algorithm, giving empirical results for the game of Othello.

Now the whole point of search (as opposed to just picking whichever child looks best to an evaluation function) is to insulate oneself from errors in the evaluation function. When one searches below a node, one gains more information and one's opinion of the value of that node may change. Such "opinion changes" are inherently probabilistic. They occur because one's information or computational abilities are unable to distinguish different states, e.g. a node with a given set of features might have different values. In this paper we adopt a probabilistic model of opinion changes, de-1This is a super-abbreviated discussion of [Baum and Smith, 1993] written by EBB for this conference.

Callaway, Frederick, Gul, Sayan, Krueger, Paul M., Griffiths, Thomas L., Lieder, Falk

The efficient use of limited computational resources is an essential ingredient of intelligence. Selecting computations optimally according to rational metareasoning would achieve this, but this is computationally intractable. Inspired by psychology and neuroscience, we propose the first concrete and domain-general learning algorithm for approximating the optimal selection of computations: Bayesian metalevel policy search (BMPS). We derive this general, sample-efficient search algorithm for a computation-selecting metalevel policy based on the insight that the value of information lies between the myopic value of information and the value of perfect information. We evaluate BMPS on three increasingly difficult metareasoning problems: when to terminate computation, how to allocate computation between competing options, and planning. Across all three domains, BMPS achieved near-optimal performance and compared favorably to previously proposed metareasoning heuristics. Finally, we demonstrate the practical utility of BMPS in an emergency management scenario, even accounting for the overhead of metareasoning.