The intelligent reformulation or restructuring of a belief network can greatly increase the efficiency of inference. However, time expended for reformulation is not available for performing inference. Thus, under time pressure, there is a tradeoff between the time dedicated to reformulating the network and the time applied to the implementation of a solution. We investigate this partition of resources into time applied to reformulation and time used for inference. We shall describe first general principles for computing the ideal partition of resources under uncertainty. These principles have applicability to a wide variety of problems that can be divided into interdependent phases of problem solving. After, we shall present results of our empirical study of the problem of determining the ideal amount of time to devote to searching for clusters in belief networks. In this work, we acquired and made use of probability distributions that characterize (1) the performance of alternative heuristic search methods for reformulating a network instance into a set of cliques, and (2) the time for executing inference procedures on various belief networks. Given a preference model describing the value of a solution as a function of the delay required for its computation, the system selects an ideal time to devote to reformulation.
We can achieve significant gains in the value of computation by metareasoning about the nature or extent of base-level problem solving before executing a solution. However, resources that are irrevocably committed to metareasoning are not available for executing a solution. Thus, it is important to determine the portion of resources we wish to apply to metareasoning and control versus to the execution of a solution plan. Recent research on rational agency has highlighted the importance of limiting the consumption of resources by metareasoning machinery. We shall introduce the metareasoning-partition problem–the problem of ideally apportioning costly reasoning resources to planning a solution versus applying resource to executing a solution to a problem. We exercise prototypical metareasoning-partition models to probe the relationships between time allocated to metareasoning and to execution for different problem classes.
We discuss representing and reasoning with knowledge about the time-dependent utility of an agent's actions. Time-dependent utility plays a crucial role in the interaction between computation and action under bounded resources. We present a semantics for time-dependent utility and describe the use of time-dependent information in decision contexts. We illustrate our discussion with examples of time-pressured reasoning in Protos, a system constructed to explore the ideal control of inference by reasoners with limit abilities.
The efficient use of limited computational resources is an essential ingredient of intelligence. Selecting computations optimally according to rational metareasoning would achieve this, but this is computationally intractable. Inspired by psychology and neuroscience, we propose the first concrete and domain-general learning algorithm for approximating the optimal selection of computations: Bayesian metalevel policy search (BMPS). We derive this general, sample-efficient search algorithm for a computation-selecting metalevel policy based on the insight that the value of information lies between the myopic value of information and the value of perfect information. We evaluate BMPS on three increasingly difficult metareasoning problems: when to terminate computation, how to allocate computation between competing options, and planning. Across all three domains, BMPS achieved near-optimal performance and compared favorably to previously proposed metareasoning heuristics. Finally, we demonstrate the practical utility of BMPS in an emergency management scenario, even accounting for the overhead of metareasoning.
The conventional model for online planning under uncertainty assumes that an agent can stop and plan without incurring costs for the time spent planning. However, planning time is not free in most real-world settings. For example, an autonomous drone is subject to nature's forces, like gravity, even while it thinks, and must either pay a price for counteracting these forces to stay in place, or grapple with the state change caused by acquiescing to them. Policy optimization in these settings requires metareasoning---a process that trades off the cost of planning and the potential policy improvement that can be achieved. We formalize and analyze the metareasoning problem for Markov Decision Processes (MDPs). Our work subsumes previously studied special cases of metareasoning and shows that in the general case, metareasoning is at most polynomially harder than solving MDPs with any given algorithm that disregards the cost of thinking. For reasons we discuss, optimal general metareasoning turns out to be impractical, motivating approximations. We present approximate metareasoning procedures which rely on special properties of the BRTDP planning algorithm and explore the effectiveness of our methods on a variety of problems.