Planning & Scheduling
Goal recognition via model-based and model-free techniques
Borrajo, Daniel, Gopalakrishnan, Sriram, Potluru, Vamsi K.
Humans interact with the world based on their inner motivations (goals) by performing actions. Those actions might be observable by financial institutions. In turn, financial institutions might log all these observed actions for better understanding human behavior. Examples of such interactions are investment operations (buying or selling options), account-related activities (creating accounts, making transactions, withdrawing money), digital interactions (utilizing the bank's web or mobile app for configuring alerts, or applying for a new credit card), or even illicit operations (such as fraud or money laundering). Once human behavior can be better understood, financial institutions can improve their processes allowing them to deepen the relationship with clients, offering targeted services (marketing), handling complaints-related interactions (operations), or performing fraud or money laundering investigations (compliance) [Borrajo et al., 2020].
Simulating and classifying behavior in adversarial environments based on action-state traces: an application to money laundering
Borrajo, Daniel, Veloso, Manuela, Shah, Sameena
Many business applications involve adversarial relationships in which both sides adapt their strategies to optimize their opposing benefits. One of the key characteristics of these applications is the wide range of strategies that an adversary may choose as they adapt their strategy dynamically to sustain benefits and evade authorities. In this paper, we present a novel way of approaching these types of applications, in particular in the context of Anti-Money Laundering. We provide a mechanism through which diverse, realistic and new unobserved behavior may be generated to discover potential unobserved adversarial actions to enable organizations to preemptively mitigate these risks. In this regard, we make three main contributions. (a) Propose a novel behavior-based model as opposed to individual transactions-based models currently used by financial institutions. We introduce behavior traces as enriched relational representation to represent observed human behavior. (b) A modelling approach that observes these traces and is able to accurately infer the goals of actors by classifying the behavior into money laundering or standard behavior despite significant unobserved activity. And (c) a synthetic behavior simulator that can generate new previously unseen traces. The simulator incorporates a high level of flexibility in the behavioral parameters so that we can challenge the detection algorithm. Finally, we provide experimental results that show that the learning module (automated investigator) that has only partial observability can still successfully infer the type of behavior, and thus the simulated goals, followed by customers based on traces - a key aspiration for many applications today.
Provenance-Based Assessment of Plans in Context
Friedman, Scott E., Goldman, Robert P., Freedman, Richard G., Kuter, Ugur, Geib, Christopher, Rye, Jeffrey
Many real-world planning domains involve diverse information sources, external entities, and variable-reliability agents, all of which may impact the confidence, risk, and sensitivity of plans. Humans reviewing a plan may lack context about these factors; however, this information is available during the domain generation, which means it can also be interwoven into the planner and its resulting plans. This paper presents a provenance-based approach to explaining automated plans. Our approach (1) extends the SHOP3 HTN planner to generate dependency information, (2) transforms the dependency information into an established PROV-O representation, and (3) uses graph propagation and TMS-inspired algorithms to support dynamic and counter-factual assessment of information flow, confidence, and support. We qualified our approach's explanatory scope with respect to explanation targets from the automated planning literature and the information analysis literature, and we demonstrate its ability to assess a plan's pertinence, sensitivity, risk, assumption support, diversity, and relative confidence.
Learning Vision-based Reactive Policies for Obstacle Avoidance
Aljalbout, Elie, Chen, Ji, Ritt, Konstantin, Ulmer, Maximilian, Haddadin, Sami
During task execution, robots should be capable of operating in their workspaces without colliding with obstacles. In well-structured environments, this constraint can be easily ensured by carefully designing collision-free motion trajectories based on the understanding of the robot surroundings. In contrast, unstructured environments present the challenge of autonomously reacting to previously unknown settings. To tackle this challenge, extra efforts are needed in order to design proper perception systems, capable of understanding the environment, as well as reactive strategies to avoid the obstacles. In this work, we are concerned with obstacle avoidance for robot manipulators. In addition to the previously mentioned challenges, such systems impose additional constraints such as joint limits, singularities and self-collision. All of these aspects add to the complexity of the problem, and require proper care in the formulation of both classical and learning-based methods. In this context, proprioceptive robot sensors enable collision detection and early reactions which can prevent substantial damages to the robot and its environment [1].
Interleaving Fast and Slow Decision Making
Gulati, Aditya, Soni, Sarthak, Rao, Shrisha
The "Thinking, Fast and Slow" paradigm of Kahneman proposes that we use two different styles of thinking -- a fast and intuitive System 1 for certain tasks, along with a slower but more analytical System 2 for others. While the idea of using this two-system style of thinking is gaining popularity in AI and robotics, our work considers how to interleave the two styles of decision-making, i.e., how System 1 and System 2 should be used together. For this, we propose a novel and general framework which includes a new System 0 to oversee Systems 1 and 2. At every point when a decision needs to be made, System 0 evaluates the situation and quickly hands over the decision-making process to either System 1 or System 2. We evaluate such a framework on a modified version of the classic Pac-Man game, with an already-trained RL algorithm for System 1, a Monte-Carlo tree search for System 2, and several different possible strategies for System 0. As expected, arbitrary switches between Systems 1 and 2 do not work, but certain strategies do well. With System 0, an agent is able to perform better than one that uses only System 1 or System 2.
Model Minimization For Online Predictability
Gopalakrishnan, Sriram, Kambhampati, Subbarao
For humans in a teaming scenario, context switching between reasoning about a teammate's behavior and thinking about thier own task can slow us down, especially if the cognitive cost of predicting the teammate's actions is high. So if we can make the prediction of a robot-teammate's actions quicker, then the human can be more productive. In this paper we present an approach to constrain the actions of a robot so as to increase predictability (specifically online predictability) while keeping the plan costs of the robot within acceptable limits. Existing works on human-robot interaction do not consider the computational cost for predictability, which we consider in our approach. We approach this problem from the perspective of directed graph minimization, and we connect the concept of predictability to the out-degree of vertices. We present an algorithm to minimize graphs for predictability, and contrast this with minimization for legibility (goal inference) and optimality.
AM-RRT*: Informed Sampling-based Planning with Assisting Metric
Armstrong, Daniel, Jonasson, André
In this paper, we present a new algorithm that extends RRT* and RT-RRT* for online path planning in complex, dynamic environments. Sampling-based approaches often perform poorly in environments with narrow passages, a feature common to many indoor applications of mobile robots as well as computer games. Our method extends RRT-based sampling methods to enable the use of an assisting distance metric to improve performance in environments with obstacles. This assisting metric, which can be any metric that has better properties than the Euclidean metric when line of sight is blocked, is used in combination with the standard Euclidean metric in such a way that the algorithm can reap benefits from the assisting metric while maintaining the desirable properties of previous RRT variants - namely probabilistic completeness in tree coverage and asymptotic optimality in path length. We also introduce a new method of targeted rewiring, aimed at shortening search times and path lengths in tasks where the goal shifts repeatedly. We demonstrate that our method offers considerable improvements over existing multi-query planners such as RT-RRT* when using diffusion distance as an assisting metric; finding near-optimal paths with a decrease in search time of several orders of magnitude. Experimental results show planning times reduced by 99.5% and path lengths by 9.8% over existing real-time RRT planners in a variety of environments.
Formally Verified SAT-Based AI Planning
Abdulaziz, Mohammad, Kurz, Friedrich
In the realm of planning, this approach was pioneered by Howey, Long, and Fox As witnessed by the different planning competitions (Long who developed VAL (Howey, Long, and Fox 2004) that, 2000; Coles et al. 2012; Vallati et al. 2015), planning algorithms given a planning problem and potential solution, certifies and systems are becoming more and more scalable that the solution actually solves the given problem. Also, and efficient, which makes them suited for more realistic certifying unsolvability for planning was tackled by Eriksson, applications. Given that many applications of planning Röger, and Helmert (2017) who provided unsolvability are safety-critical, increasing the trustworthiness of certificates and checkers for state-space search algorithms planning algorithms and systems--i.e. the likelihood that and by Eriksson and Helmert (2020) for property they compute correct results--could be instrumental in their directed SATbased planning.
Forethought and Hindsight in Credit Assignment
Chelu, Veronica, Precup, Doina, van Hasselt, Hado
We address the problem of credit assignment in reinforcement learning and explore fundamental questions regarding the way in which an agent can best use additional computation to propagate new information, by planning with internal models of the world to improve its predictions. Particularly, we work to understand the gains and peculiarities of planning employed as forethought via forward models or as hindsight operating with backward models. We establish the relative merits, limitations and complementary properties of both planning mechanisms in carefully constructed scenarios. Further, we investigate the best use of models in planning, primarily focusing on the selection of states in which predictions should be (re)- evaluated. Lastly, we discuss the issue of model estimation and highlight a spectrum of methods that stretch from explicit environment-dynamics predictors to more abstract planner-aware models.
Robust Hierarchical Planning with Policy Delegation
We propose a novel framework and algorithm for hierarchical planning based on the principle of delegation. This framework, the Markov Intent Process, features a collection of skills which are each specialised to perform a single task well. Skills are aware of their intended effects and are able to analyse planning goals to delegate planning to the best-suited skill. This principle dynamically creates a hierarchy of plans, in which each skill plans for sub-goals for which it is specialised. The proposed planning method features on-demand execution---skill policies are only evaluated when needed. Plans are only generated at the highest level, then expanded and optimised when the latest state information is available. The high-level plan retains the initial planning intent and previously computed skills, effectively reducing the computation needed to adapt to environmental changes. We show this planning approach is experimentally very competitive to classic planning and reinforcement learning techniques on a variety of domains, both in terms of solution length and planning time.