"The capacity to think about our own thinking may lie at the heart of what it means to be both human and intelligent. Philosophers and cognitive scientists have investigated these matters for many years. Researchers in artificial intelligence have gone further, attempting to implement actual machines that mimic, simulate, and perhaps even replicate this capacity, called metareasoning."
– from Metareasoning: Thinking about Thinking. Edited by Michael T. Cox and Anita Raja. MIT Press, 2011.
When minimizing makespan during off-line planning, the fastest action sequence to reach a particular state is, by definition, preferred. When trying to reach a goal quickly in on-line planning, previous work has inherited that assumption: the faster of two paths that both reach the same state is usually considered to dominate the slower one. In this short paper, we point out that, when planning happens concurrently with execution, selecting a slower action can allow additional time for planning, leading to better plans. We present Slo'RTS, a metareasoning planning algorithm that estimates whether the expected improvement in future decision-making from this increased planning time is enough to make up for the increased duration of the selected action. Using simple benchmarks, we show that Slo'RTS can yield shorter time-to-goal than a conventional planner. This generalizes previous work on metareasoning in on-line planning and highlights the inherent uncertainty present in an on-line setting.
The capacity to think about our own thinking may lie at the heart of what it means to be both human and intelligent. Philosophers and cognitive scientists have investigated these matters for many years. Researchers in artificial intelligence have gone further, attempting to implement actual machines that mimic, simulate, and perhaps even replicate this capacity, called metareasoning. In this volume, leading authorities offer a variety of perspectives--drawn from philosophy, cognitive psychology, and computer science--on reasoning about the reasoning process. The book offers a simple model of reasoning about reason as a framework for its discussions.
The conventional model for online planning under uncertainty assumes that an agent can stop and plan without incurring costs for the time spent planning. However, planning time is not free in most real-world settings. For example, an autonomous drone is subject to nature's forces, like gravity, even while it thinks, and must either pay a price for counteracting these forces to stay in place, or grapple with the state change caused by acquiescing to them. Policy optimization in these settings requires metareasoning---a process that trades off the cost of planning and the potential policy improvement that can be achieved. We formalize and analyze the metareasoning problem for Markov Decision Processes (MDPs). Our work subsumes previously studied special cases of metareasoning and shows that in the general case, metareasoning is at most polynomially harder than solving MDPs with any given algorithm that disregards the cost of thinking. For reasons we discuss, optimal general metareasoning turns out to be impractical, motivating approximations. We present approximate metareasoning procedures which rely on special properties of the BRTDP planning algorithm and explore the effectiveness of our methods on a variety of problems.
Selecting the right algorithm is an important problem in computer science, because the algorithm often has to exploit the structure of the input to be efficient. The human mind faces the same challenge. Therefore, solutions to the algorithm selection problem can inspire models of human strategy selection and vice versa. Here, we view the algorithm selection problem as a special case of metareasoning and derive a solution that outperforms existing methods in sorting algorithm selection. We apply our theory to model how people choose between cognitive strategies and test its prediction in a behavioral experiment. We find that people quickly learn to adaptively choose between cognitive strategies. People's choices in our experiment are consistent with our model but inconsistent with previous theories of human strategy selection. Rational metareasoning appears to be a promising framework for reverse-engineering how people choose among cognitive strategies and translating the results into better solutions to the algorithm selection problem.
My research seeks to answer the question of how any agent that is tasked with making sense of its world, by finding explanations for evidence (e.g., sensor reports) using domain-general strategies, may accurately and efficiently handle incomplete evidence, noisy evidence, and an incomplete knowledge base. I propose the following answer to the question. The agent should employ an optimal abductive reasoning algorithm (developed piece-wise and shown to be best in a class of similar algorithms) that allows it to reason from evidence to causes. For the sake of efficiency and operational concerns, the agent should establish beliefs periodically rather than waiting until it has obtained all evidence it will ever be able to obtain. If the agent commits to beliefs on the basis of incomplete or noisy evidence or an incomplete knowledge base, these beliefs may be incorrect. Future evidence obtained by the agent may result in failed predictions or anomalies. The agent is then tasked with determining whether it should retain its beliefs and therefore discount the newly-obtained evidence, revise its prior beliefs, or expand its knowledge base (what can be described as anomaly-driven or explanation-based learning). I have developed an abductive metareasoning procedure that aims to appropriately reason about these situations. Preliminary experiments in two reasoning tasks indicate that the procedure is effective.
Metareasoning is the process of reasoning about reasoning itself. It is composed of both the metalevel control of computational activities and the introspective monitoring of reasoning. Meta-level control is the ability of an agent to efficiently trade off its resources between object level actions (computations) and ground level actions to maximize the quality of its decisions. While meta-level control allows agents to dynamically adapt their object level computation, it could interfere with ground level performance. Identifying the decision points that require meta-level control is of importance to the performance of agents operating in resource-bounded environments.
Representations of an AI agent's mental states and processes are necessary to enable metareasoning, i.e. thinking about thinking. However, the formulation of suitable representations remains an outstanding AI research challenge, with no clear consensus on how to proceed. This paper outlines an approach involving the formulation of anthropomorphic self-models, where the representations that are used for metareasoning are based on formalizations of commonsense psychology. We describe two research activities that support this approach, the formalization of broad-coverage commonsense psychology theories and use of representations in the monitoring and control of objectlevel reasoning. We focus specifically on metareasoning about memory, but argue that anthropomorphic self-models support the development of integrated, reusable, broadcoverage representations for use in metareasoning systems.
What role does metareasoning play in models of bounded rationality? We examine the various existing computational approaches to bounded rationality and divide them into three classes. Only one of these classes significantly relies on a metareasoning component. We explore the characteristics of this class of models and argue that it offers desirable properties. In fact, many of the effective approaches to bounded rationality that have been developed since the early 1980's match this particular paradigm. We conclude with some open research problems and challenges.
Metareasoning research often lays out high-level principles, which are then applied in the context of larger systems. While this approach has proven quite successful, it sometimes obscures how metareasoning can be seen as a crisp computational problem in its own right. This alternative view allows us to apply tools from the theory of algorithms and computational complexity to metareasoning. In this paper, we consider some known results on how variants of the metareasoning problem can be precisely formalized as computational problems, and shown to be computationally hard to solve to optimality. We discuss a variety of techniques for addressing these hardness results.
This manifesto proposes a simple model of metareasoning that constitutes a general framework to organize research on this topic. The claim is that metareasoning, like the actionperception cycle of reasoning, is composed of the introspective monitoring of reasoning and the subsequent meta-level control of reasoning. This model holds for single agent and multiagent systems and is broad enough to include models of self. We offer the model as a short conversation piece to which the community can compare and contrast individual theories.