Planning & Scheduling
A Survey of Behavior Trees in Robotics and AI
Iovino, Matteo, Scukins, Edvards, Styrud, Jonathan, Ögren, Petter, Smith, Christian
Behavior Trees (BTs) were invented as a tool to enable modular AI in computer games, but have received an increasing amount of attention in the robotics community in the last decade. With rising demands on agent AI complexity, game programmers found that the Finite State Machines (FSM) that they used scaled poorly and were difficult to extend, adapt and reuse. In BTs, the state transition logic is not dispersed across the individual states, but organized in a hierarchical tree structure, with the states as leaves. This has a significant effect on modularity, which in turn simplifies both synthesis and analysis by humans and algorithms alike. These advantages are needed not only in game AI design, but also in robotics, as is evident from the research being done. In this paper we present a comprehensive survey of the topic of BTs in Artificial Intelligence and Robotic applications. The existing literature is described and categorized based on methods, application areas and contributions, and the paper is concluded with a list of open research challenges.
Goal Recognition over Imperfect Domain Models
Goal recognition is the problem of recognizing the intended goal of autonomous agents or humans by observing their behavior in an environment. Over the past years, most existing approaches to goal and plan recognition have been ignoring the need to deal with imperfections regarding the domain model that formalizes the environment where autonomous agents behave. In this thesis, we introduce the problem of goal recognition over imperfect domain models, and develop solution approaches that explicitly deal with two distinct types of imperfect domains models: (1) incomplete discrete domain models that have possible, rather than known, preconditions and effects in action descriptions; and (2) approximate continuous domain models, where the transition function is approximated from past observations and not well-defined. We develop novel goal recognition approaches over imperfect domains models by leveraging and adapting existing recognition approaches from the literature. Experiments and evaluation over these two types of imperfect domains models show that our novel goal recognition approaches are accurate in comparison to baseline approaches from the literature, at several levels of observability and imperfections.
Argument Schemes for Explainable Planning
Mahesar, Quratul-ain, Parsons, Simon
Artificial Intelligence (AI) is being increasingly used to develop systems that produce intelligent solutions. However, there is a major concern that whether the systems built will be trusted by humans. In order to establish trust in AI systems, there is a need for the user to understand the reasoning behind their solutions and therefore, the system should be able to explain and justify its output. In this paper, we use argumentation to provide explanations in the domain of AI planning. We present argument schemes to create arguments that explain a plan and its components; and a set of critical questions that allow interaction between the arguments and enable the user to obtain further information regarding the key elements of the plan. Finally, we present some properties of the plan arguments.
BCI-Controlled Hands-Free Wheelchair Navigation with Obstacle Avoidance
Mounir, Ramy, Alqasemi, Redwan, Dubey, Rajiv
Brain-Computer interfaces (BCI) are widely used in reading brain signals and converting them into real-world motion. However, the signals produced from the BCI are noisy and hard to analyze. This paper looks specifically towards combining the BCI's latest technology with ultrasonic sensors to provide a hands-free wheelchair that can efficiently navigate through crowded environments. This combination provides safety and obstacle avoidance features necessary for the BCI Navigation system to gain more confidence and operate the wheelchair at a relatively higher velocity. A population of six human subjects tested the BCI-controller and obstacle avoidance features. Subjects were able to mentally control the destination of the wheelchair, by moving the target from the starting position to a predefined position, in an average of 287.12 seconds and a standard deviation of 48.63 seconds after 10 minutes of training. The wheelchair successfully avoided all obstacles placed by the subjects during the test.
The More the Merrier?! Evaluating the Effect of Landmark Extraction Algorithms on Landmark-Based Goal Recognition
Gusmão, Kin Max Piamolini, Pereira, Ramon Fraga, Meneguzzi, Felipe
Recent approaches to goal and plan recognition using classical planning domains have achieved state of the art results in terms of both recognition time and accuracy by using heuristics based on planning landmarks. To achieve such fast recognition time these approaches use efficient, but incomplete, algorithms to extract only a subset of landmarks for planning domains and problems, at the cost of some accuracy. In this paper, we investigate the impact and effect of using various landmark extraction algorithms capable of extracting a larger proportion of the landmarks for each given planning problem, up to exhaustive landmark extraction. We perform an extensive empirical evaluation of various landmark-based heuristics when using different percentages of the full set of landmarks. Results show that having more landmarks does not necessarily mean achieving higher accuracy and lower spread, as the additional extracted landmarks may not necessarily increase be helpful towards the goal recognition task.
Generalized Planning With Deep Reinforcement Learning
Rivlin, Or, Hazan, Tamir, Karpas, Erez
Classical Planning is concerned with finding plans, or sequences of actions, that when applied to some initial condition specified by a set of logical predicates, will bring the environment to a state that satisfies a set of goal predicates. This is usually performed by some heuristic search procedure, and the resulting plan is applicable only to the specific instance that was solved. However, a possibly stronger outcome would be to find some sort of higher level plan that can solve many instances that belong to the same domain, and thus share an underlying structure. The study of methods that can discover such higher level plans is called Generalized Planning. Generalized plans do not necessarily exist for all classical planning domains, but finding such solutions for domains in which it is possible could obviate the need to perform compute intensive search in cases where we only wish to find a goal satisfying solution. To give an example of such a generalized plan, let us consider a simplified Blocksworld domain. In this domain there are unique blocks that can be either stacked on each other or strewn about the floor, and the goal is to stack and unstack blocks such that we arrive at a goal configuration from an initial configuration. Finding a plan that does so in an optimal number of steps is generally NPhard [10], but finding a plan that satisfies the goal regardless of cost can be done in polynomial time in the following manner: 1. Unstack all the blocks so that they are scattered on the floor 2. stack the block according to the goal configuration, beginning with the lower blocks This strategy is not optimal since we might unstack blocks that are already in their proper place according to the goal specification, but it will yield a goal satisfying plan for every instance in this simplified Blocksworld domain.
ASNets: Deep Learning for Generalised Planning
Toyer, Sam (UC Berkeley) | Thiébaux, Sylvie (Australian National University) | Trevizan, Felipe (Australian National University) | Xie, Lexing (Australian National University)
In this paper, we discuss the learning of generalised policies for probabilistic and classical planning problems using Action Schema Networks (ASNets). The ASNet is a neural network architecture that exploits the relational structure of (P)PDDL planning problems to learn a common set of weights that can be applied to any problem in a domain. By mimicking the actions chosen by a traditional, non-learning planner on a handful of small problems in a domain, ASNets are able to learn a generalised reactive policy that can quickly solve much larger instances from the domain. This work extends the ASNet architecture to make it more expressive, while still remaining invariant to a range of symmetries that exist in PPDDL problems. We also present a thorough experimental evaluation of ASNets, including a comparison with heuristic search planners on seven probabilistic and deterministic domains, an extended evaluation on over 18,000 Blocksworld instances, and an ablation study. Finally, we show that sparsity-inducing regularisation can produce ASNets that are compact enough for humans to understand, yielding insights into how the structure of ASNets allows them to generalise across a domain.
Divide-and-Conquer Monte Carlo Tree Search For Goal-Directed Planning
Parascandolo, Giambattista, Buesing, Lars, Merel, Josh, Hasenclever, Leonard, Aslanides, John, Hamrick, Jessica B., Heess, Nicolas, Neitz, Alexander, Weber, Theophane
Standard planners for sequential decision making (including Monte Carlo planning, tree search, dynamic programming, etc.) are constrained by an implicit sequential planning assumption: The order in which a plan is constructed is the same in which it is executed. We consider alternatives to this assumption for the class of goal-directed Reinforcement Learning (RL) problems. Instead of an environment transition model, we assume an imperfect, goal-directed policy. This low-level policy can be improved by a plan, consisting of an appropriate sequence of sub-goals that guide it from the start to the goal state. We propose a planning algorithm, Divide-and-Conquer Monte Carlo Tree Search (DC-MCTS), for approximating the optimal plan by means of proposing intermediate sub-goals which hierarchically partition the initial tasks into simpler ones that are then solved independently and recursively. The algorithm critically makes use of a learned sub-goal proposal for finding appropriate partitions trees of new tasks based on prior experience. Different strategies for learning sub-goal proposals give rise to different planning strategies that strictly generalize sequential planning. We show that this algorithmic flexibility over planning order leads to improved results in navigation tasks in grid-worlds as well as in challenging continuous control environments.
Flexible and Efficient Long-Range Planning Through Curious Exploration
Curtis, Aidan, Xin, Minjian, Arumugam, Dilip, Feigelis, Kevin, Yamins, Daniel
Identifying algorithms that flexibly and efficiently discover temporally-extended multi-phase plans is an essential step for the advancement of robotics and model-based reinforcement learning. The core problem of long-range planning is finding an efficient way to search through the tree of possible action sequences. Existing non-learned planning solutions from the Task and Motion Planning (TAMP) literature rely on the existence of logical descriptions for the effects and preconditions for actions. This constraint allows TAMP methods to efficiently reduce the tree search problem but limits their ability to generalize to unseen and complex physical environments. In contrast, deep reinforcement learning (DRL) methods use flexible neural-network-based function approximators to discover policies that generalize naturally to unseen circumstances. However, DRL methods struggle to handle the very sparse reward landscapes inherent to long-range multi-step planning situations. Here, we propose the Curious Sample Planner (CSP), which fuses elements of TAMP and DRL by combining a curiosity-guided sampling strategy with imitation learning to accelerate planning. We show that CSP can efficiently discover interesting and complex temporally-extended plans for solving a wide range of physically realistic 3D tasks. In contrast, standard planning and learning methods often fail to solve these tasks at all or do so only with a huge and highly variable number of training samples. We explore the use of a variety of curiosity metrics with CSP and analyze the types of solutions that CSP discovers. Finally, we show that CSP supports task transfer so that the exploration policies learned during experience with one task can help improve efficiency on related tasks.