Collaborating Authors

Lieder, Falk

Have I done enough planning or should I plan more? Artificial Intelligence

People's decisions about how to allocate their limited computational resources are essential to human intelligence. An important component of this metacognitive ability is deciding whether to continue thinking about what to do and move on to the next decision. Here, we show that people acquire this ability through learning and reverse-engineer the underlying learning mechanisms. Using a process-tracing paradigm that externalises human planning, we find that people quickly adapt how much planning they perform to the cost and benefit of planning. To discover the underlying metacognitive learning mechanisms we augmented a set of reinforcement learning models with metacognitive features and performed Bayesian model selection. Our results suggest that the metacognitive ability to adjust the amount of planning might be learned through a policy-gradient mechanism that is guided by metacognitive pseudo-rewards that communicate the value of planning.

Optimal To-Do List Gamification for Long Term Planning Artificial Intelligence

Most people struggle with prioritizing work. While inexact heuristics have been developed over time, there is still no tractable principled algorithm for deciding which of the many possible tasks one should tackle in any given day, month, week, or year. Additionally, some people suffer from cognitive biases such as the present bias, leading to prioritization of their immediate experience over long-term consequences which manifests itself as procrastination and inefficient task prioritization. Our method utilizes optimal gamification to help people overcome these problems by incentivizing each task by a number of points that convey how valuable it is in the long-run. We extend the previous version of our optimal gamification method with added services for helping people decide which tasks should and should not be done when there is not enough time to do everything. To improve the efficiency and scalability of the to-do list solver, we designed a hierarchical procedure that tackles the problem from the top-level goals to fine-grained tasks. We test the accuracy of the incentivised to-do list by comparing the performance of the strategy with the points computed exactly using Value Iteration for a variety of case studies. These case studies were specifically designed to cover the corner cases to get an accurate judge of performance. Our method yielded the same performance as the exact method for all case studies. To demonstrate its functionality, we released an API that makes it easy to deploy our method in Web and app services. We assessed the scalability of our method by applying it to to-do lists with increasingly larger numbers of goals, sub-goals per goal, hierarchically nested levels of subgoals. We found that the method provided through our API is able to tackle fairly large to-do lists having a 576 tasks. This indicates that our method is suitable for real-world applications.

Improving Human Decision-Making by Discovering Efficient Strategies for Hierarchical Planning Artificial Intelligence

To make good decisions in the real world people need efficient planning strategies because their computational resources are limited. Knowing which planning strategies would work best for people in different situations would be very useful for understanding and improving human decision-making. But our ability to compute those strategies used to be limited to very small and very simple planning tasks. To overcome this computational bottleneck, we introduce a cognitively-inspired reinforcement learning method that can overcome this limitation by exploiting the hierarchical structure of human behavior. The basic idea is to decompose sequential decision problems into two sub-problems: setting a goal and planning how to achieve it. This hierarchical decomposition enables us to discover optimal strategies for human planning in larger and more complex tasks than was previously possible. The discovered strategies outperform existing planning algorithms and achieve a super-human level of computational efficiency. We demonstrate that teaching people to use those strategies significantly improves their performance in sequential decision-making tasks that require planning up to eight steps ahead. By contrast, none of the previous approaches was able to improve human performance on these problems. These findings suggest that our cognitively-informed approach makes it possible to leverage reinforcement learning to improve human decision-making in complex sequential decision-problems. Future work can leverage our method to develop decision support systems that improve human decision making in the real world.

Optimal to-do list gamification Artificial Intelligence

What should I work on first? What can wait until later? Which projects should I prioritize and which tasks are not worth my time? These are challenging questions that many people face every day. People's intuitive strategy is to prioritize their immediate experience over the long-term consequences. This leads to procrastination and the neglect of important long-term projects in favor of seemingly urgent tasks that are less important. Optimal gamification strives to help people overcome these problems by incentivizing each task by a number of points that communicates how valuable it is in the long-run. Unfortunately, computing the optimal number of points with standard dynamic programming methods quickly becomes intractable as the number of a person's projects and the number of tasks required by each project increase. Here, we introduce and evaluate a scalable method for identifying which tasks are most important in the long run and incentivizing each task according to its long-term value. Our method makes it possible to create to-do list gamification apps that can handle the size and complexity of people's to-do lists in the real world.

Automatic Discovery of Interpretable Planning Strategies Artificial Intelligence

When making decisions, people often overlook critical information or are overly swayed by irrelevant information. A common approach to mitigate these biases is to provide decision-makers, especially professionals such as medical doctors, with decision aids, such as decision trees and flowcharts. Designing effective decision aids is a difficult problem. We propose that recently developed reinforcement learning methods for discovering clever heuristics for good decision-making can be partially leveraged to assist human experts in this design process. One of the biggest remaining obstacles to leveraging the aforementioned methods for improving human decision-making is that the policies they learn are opaque to people. To solve this problem, we introduce AI-Interpret: a general method for transforming idiosyncratic policies into simple and interpretable descriptions. Our algorithm combines recent advances in imitation learning and program induction with a new clustering method for identifying a large subset of demonstrations that can be accurately described by a simple, high-performing decision rule. We evaluate our new AI-Interpret algorithm and employ it to translate information-acquisition policies discovered through metalevel reinforcement learning. The results of three large behavioral experiments showed that the provision of decision rules as flowcharts significantly improved people's planning strategies and decisions across three different classes of sequential decision problems. Furthermore, a series of ablation studies confirmed that our AI-Interpret algorithm was critical to the discovery of interpretable decision rules and that it is ready to be applied to other reinforcement learning problems. We conclude that the methods and findings presented in this article are an important step towards leveraging automatic strategy discovery to improve human decision-making.

Burn-in, bias, and the rationality of anchoring

Neural Information Processing Systems

Bayesian inference provides a unifying framework for addressing problems in machine learning, artificial intelligence, and robotics, as well as the problems facing the human mind. Unfortunately, exact Bayesian inference is intractable in all but the simplest models. Therefore minds and machines have to approximate Bayesian inference. Approximate inference algorithms can achieve a wide range of time-accuracy tradeoffs, but what is the optimal tradeoff? We investigate time-accuracy tradeoffs using the Metropolis-Hastings algorithm as a metaphor for the mind's inference algorithm(s).

Algorithm selection by rational metareasoning as a model of human strategy selection

Neural Information Processing Systems

Selecting the right algorithm is an important problem in computer science, because the algorithm often has to exploit the structure of the input to be efficient. The human mind faces the same challenge. Therefore, solutions to the algorithm selection problem can inspire models of human strategy selection and vice versa. Here, we view the algorithm selection problem as a special case of metareasoning and derive a solution that outperforms existing methods in sorting algorithm selection. We apply our theory to model how people choose between cognitive strategies and test its prediction in a behavioral experiment.

Learning to select computations Artificial Intelligence

The efficient use of limited computational resources is an essential ingredient of intelligence. Selecting computations optimally according to rational metareasoning would achieve this, but this is computationally intractable. Inspired by psychology and neuroscience, we propose the first concrete and domain-general learning algorithm for approximating the optimal selection of computations: Bayesian metalevel policy search (BMPS). We derive this general, sample-efficient search algorithm for a computation-selecting metalevel policy based on the insight that the value of information lies between the myopic value of information and the value of perfect information. We evaluate BMPS on three increasingly difficult metareasoning problems: when to terminate computation, how to allocate computation between competing options, and planning. Across all three domains, BMPS achieved near-optimal performance and compared favorably to previously proposed metareasoning heuristics. Finally, we demonstrate the practical utility of BMPS in an emergency management scenario, even accounting for the overhead of metareasoning.

When Does Bounded-Optimal Metareasoning Favor Few Cognitive Systems?

AAAI Conferences

While optimal metareasoning is notoriously intractable, humans are nonetheless able to adaptively allocate their computational resources. A possible approximation that humans may use to do this is to only metareason over a finite set of cognitive systems that perform variable amounts of computation. The highly influential "dual-process" accounts of human cognition, which postulate the coexistence of a slow accurate system with a fast error-prone system, can be seen as a special case of this approximation. This raises two questions: how many cognitive systems should a bounded optimal agent be equipped with and what characteristics should those systems have? We investigate these questions in two settings: a one-shot decision between two alternatives, and planning under uncertainty in a Markov decision process. We find that the optimal number of systems depends on the variability of the environment and the costliness of metareasoning. Consistent with dual-process theories, we also find that when having two systems is optimal, then the first system is fast but error-prone and the second system is slow but accurate.