Goto

Collaborating Authors

 Potts, Colin M.


Potential-Based Reward Shaping For Intrinsic Motivation

arXiv.org Artificial Intelligence

Recently there has been a proliferation of intrinsic motivation (IM) reward-shaping methods to learn in complex and sparse-reward environments. These methods can often inadvertently change the set of optimal policies in an environment, leading to suboptimal behavior. Previous work on mitigating the risks of reward shaping, particularly through potential-based reward shaping (PBRS), has not been applicable to many IM methods, as they are often complex, trainable functions themselves, and therefore dependent on a wider set of variables than the traditional reward functions that PBRS was developed for. We present an extension to PBRS that we prove preserves the set of optimal policies under a more general set of functions than has been previously proven. We also present {\em Potential-Based Intrinsic Motivation} (PBIM), a method for converting IM rewards into a potential-based form that is useable without altering the set of optimal policies. Testing in the MiniGrid DoorKey and Cliff Walking environments, we demonstrate that PBIM successfully prevents the agent from converging to a suboptimal policy and can speed up training.


Modeling Risk in Reinforcement Learning: A Literature Mapping

arXiv.org Artificial Intelligence

Safe RL approaches are based on specific risk representations for particular problems or domains. In order to analyze agent behaviors, compare safe RL approaches, and effectively transfer techniques between application domains, it is necessary to understand the types of risk specific to safe RL problems. We performed a systematic literature mapping with the objective to characterize risk in safe RL. Based on the obtained results, we present definitions, characteristics, and types of risk that hold on multiple application domains. Our literature mapping covers literature from the last 5 years (2017-2022), from a variety of knowledge areas (AI, finance, engineering, medicine) where RL approaches emphasize risk representation and management. Our mapping covers 72 papers filtered systematically from over thousands of papers on the topic. Our proposed notion of risk covers a variety of representations, disciplinary differences, common training exercises, and types of techniques. We encourage researchers to include explicit and detailed accounts of risk in future safe RL research reports, using this mapping as a starting point. With this information, researchers and practitioners could draw stronger conclusions on the effectiveness of techniques on different problems.


Plotter: Operationalizing the Master Book of All Plots

AAAI Conferences

Pulp fiction author William Wallace Cook published Plotto: The Master Book of All Plots in 1928, which contains almost 2000 plot fragments and relatively formal instructions on how human authors could combine them to produce plots behind novels. In this paper we show one way that the methods in this book can be used to computationally generate plots from the fragments. We also show sample plots generated by our system called Plotter that uses this method. Finally we use them to discuss idiosyncrasies and limitations of the book.


Improving Trust Estimates in Planning Domains with Rare Failure Events

AAAI Conferences

In many planning domains, it is impossible to construct plans that are guaranteed to keep the system completely safe. A common approach is to build probabilistic plans that are guaranteed to maintain system with a sufficiently high probability. For many such domains, bounds on system safety cannot be computed analytically, but instead rely on execution sampling coupled with a plan verification techniques. While probabilistic planning with verification can work well, it is not adequate in situations in which some modes of failure are very rare, simply because too many execution traces must be sampled (e.g., 1012) to ensure that the rare events of interest will occur even once. The P-CIRCA planner seeks to solve planning problems while probabilistically guaranteeing safety. Our domains frequently involve verifying that the probability of failure is below a low threshold (< 0.01). Because the events we sample have such low probabilities, we use Importance sampling (IS) (Hammersley and Handscomb 1964; Clarke and Zuliani 2011) to reduce the number of samples required. However, since we deal with an abstracted model, we cannot bias all paths individually. This prevents IS from achieving a correct bias. To compensate for this drawback we present a concept of DAGification to partially expand our representation and achieve a better bias.


Iterative-Expansion A*

AAAI Conferences

In this paper we describe an improvement to the popular IDA* search algorithm that emphasizes a different space-for-time trade-off than previously suggested. In particular, our algorithm, called Iterative-Expansion A* (IEA*), focuses on reducing redundant node expansions within individual depth-first search DFS iterations of IDA* by employing a relatively small amount of available memory--bounded by the error in the heuristic--to store selected nodes. The additional memory required is exponential not in the solution depth, but only in the difference between the solution depth and the estimated solution depth. A constant-time hash set lookup can then be used to prune entire subtrees as DFS proceeds. Overall, we show 2- to 26-fold time speedups vs. an optimized version of IDA* across several domains, and compare IEA* with several other competing approaches. We also sketch proofs of optimality and completeness for IEA*, and note that IEA* is particularly efficient for solving implicitly-defined general graph search problems.