We can achieve significant gains in the value of computation by metareasoning about the nature or extent of base-level problem solving before executing a solution. However, resources that are irrevocably committed to metareasoning are not available for executing a solution. Thus, it is important to determine the portion of resources we wish to apply to metareasoning and control versus to the execution of a solution plan. Recent research on rational agency has highlighted the importance of limiting the consumption of resources by metareasoning machinery. We shall introduce the metareasoning-partition problem--the problem of ideally apportioning costly reasoning resources to planning a solution versus applying resource to executing a solution to a problem. We exercise prototypical metareasoning-partition models to probe the relationships between time allocated to metareasoning and to execution for different problem classes. Finally, we examine the value of metareasoning in the context of our functional analyses. This work was supported by a NASA Fellowship under Grant NCC-220-51, by the National Science Foundation under Grant IRI-8703710, and by the U.S. Army Research Office under Grant P-25514-EL. Computing facilities were provided by the SUMEX-AIM Resource under NLM Grant LM05208.

Breese, John S., Horvitz, Eric J.

The intelligent reformulation or restructuring of a belief network can greatly increase the efficiency of inference. However, time expended for reformulation is not available for performing inference. Thus, under time pressure, there is a tradeoff between the time dedicated to reformulating the network and the time applied to the implementation of a solution. We investigate this partition of resources into time applied to reformulation and time used for inference. We shall describe first general principles for computing the ideal partition of resources under uncertainty. These principles have applicability to a wide variety of problems that can be divided into interdependent phases of problem solving. After, we shall present results of our empirical study of the problem of determining the ideal amount of time to devote to searching for clusters in belief networks. In this work, we acquired and made use of probability distributions that characterize (1) the performance of alternative heuristic search methods for reformulating a network instance into a set of cliques, and (2) the time for executing inference procedures on various belief networks. Given a preference model describing the value of a solution as a function of the delay required for its computation, the system selects an ideal time to devote to reformulation.

Lin, Christopher H. (University of Washington) | Kolobov, Andrey (Microsoft Research) | Kamar, Ece (Microsoft Research) | Horvitz, Eric (Microsoft Research)

The conventional model for online planning under uncertainty assumes that an agent can stop and plan without incurring costs for the time spent planning. However, planning time is not free in most real-world settings. For example, an autonomous drone is subject to nature's forces, like gravity, even while it thinks, and must either pay a price for counteracting these forces to stay in place, or grapple with the state change caused by acquiescing to them. Policy optimization in these settings requires metareasoning---a process that trades off the cost of planning and the potential policy improvement that can be achieved. We formalize and analyze the metareasoning problem for Markov Decision Processes (MDPs). Our work subsumes previously studied special cases of metareasoning and shows that in the general case, metareasoning is at most polynomially harder than solving MDPs with any given algorithm that disregards the cost of thinking. For reasons we discuss, optimal general metareasoning turns out to be impractical, motivating approximations. We present approximate metareasoning procedures which rely on special properties of the BRTDP planning algorithm and explore the effectiveness of our methods on a variety of problems.

Milli, Smitha (University of California, Berkeley) | Lieder, Falk (University of California, Berkeley) | Griffiths, Thomas L. (University of California, Berkeley)

While optimal metareasoning is notoriously intractable, humans are nonetheless able to adaptively allocate their computational resources. A possible approximation that humans may use to do this is to only metareason over a finite set of cognitive systems that perform variable amounts of computation. The highly influential "dual-process" accounts of human cognition, which postulate the coexistence of a slow accurate system with a fast error-prone system, can be seen as a special case of this approximation. This raises two questions: how many cognitive systems should a bounded optimal agent be equipped with and what characteristics should those systems have? We investigate these questions in two settings: a one-shot decision between two alternatives, and planning under uncertainty in a Markov decision process. We find that the optimal number of systems depends on the variability of the environment and the costliness of metareasoning. Consistent with dual-process theories, we also find that when having two systems is optimal, then the first system is fast but error-prone and the second system is slow but accurate.

Lieder, Falk, Plunkett, Dillon, Hamrick, Jessica B., Russell, Stuart J., Hay, Nicholas, Griffiths, Tom

Selecting the right algorithm is an important problem in computer science, because the algorithm often has to exploit the structure of the input to be efficient. The human mind faces the same challenge. Therefore, solutions to the algorithm selection problem can inspire models of human strategy selection and vice versa. Here, we view the algorithm selection problem as a special case of metareasoning and derive a solution that outperforms existing methods in sorting algorithm selection. We apply our theory to model how people choose between cognitive strategies and test its prediction in a behavioral experiment. We find that people quickly learn to adaptively choose between cognitive strategies. People's choices in our experiment are consistent with our model but inconsistent with previous theories of human strategy selection. Rational metareasoning appears to be a promising framework for reverse-engineering how people choose among cognitive strategies and translating the results into better solutions to the algorithm selection problem.