Lozano-Pérez, Tomás
Active model learning and diverse action sampling for task and motion planning
Wang, Zi, Garrett, Caelan Reed, Kaelbling, Leslie Pack, Lozano-Pérez, Tomás
The objective of this work is to augment the basic abilities of a robot by learning to use new sensorimotor primitives to enable the solution of complex long-horizon problems. Solving long-horizon problems in complex domains requires flexible generative planning that can combine primitive abilities in novel combinations to solve problems as they arise in the world. In order to plan to combine primitive actions, we must have models of the preconditions and effects of those actions: under what circumstances will executing this primitive achieve some particular effect in the world? We use, and develop novel improvements on, state-of-the-art methods for active learning and sampling. We use Gaussian process methods for learning the conditions of operator effectiveness from small numbers of expensive training examples collected by experimentation on a robot. We develop adaptive sampling methods for generating diverse elements of continuous sets (such as robot configurations and object poses) during planning for solving a new task, so that planning is as efficient as possible. We demonstrate these methods in an integrated system, combining newly learned models with an efficient continuous-space robot task and motion planner to learn to solve long horizon problems more efficiently than was previously possible.
Modular meta-learning
Alet, Ferran, Lozano-Pérez, Tomás, Kaelbling, Leslie P.
Many prediction problems, such as those that arise in the context of robotics, have a simplifying underlying structure that could accelerate learning. In this paper, we present a strategy for learning a set of neural network modules that can be combined in different ways. We train different modular structures on a set of related tasks and generalize to new tasks by composing the learned modules in new ways. We show this improves performance in two robotics-related problems.
Integrating Human-Provided Information Into Belief State Representation Using Dynamic Factorization
Chitnis, Rohan, Kaelbling, Leslie Pack, Lozano-Pérez, Tomás
In partially observed environments, it can be useful for a human to provide the robot with declarative information that represents probabilistic relational constraints on properties of objects in the world, augmenting the robot's sensory observations. For instance, a robot tasked with a search-and-rescue mission may be informed by the human that two victims are probably in the same room. An important question arises: how should we represent the robot's internal knowledge so that this information is correctly processed and combined with raw sensory information? In this paper, we provide an efficient belief state representation that dynamically selects an appropriate factoring, combining aspects of the belief when they are correlated through information and separating them when they are not. This strategy works in open domains, in which the set of possible objects is not known in advance, and provides significant improvements in inference time over a static factoring, leading to more efficient planning for complex partially observed tasks. We validate our approach experimentally in two open-domain planning problems: a 2D discrete gridworld task and a 3D continuous cooking task.
Planning to Give Information in Partially Observed Domains with a Learned Weighted Entropy Model
Chitnis, Rohan, Kaelbling, Leslie Pack, Lozano-Pérez, Tomás
In many real-world robotic applications, an autonomous agent must act within and explore a partially observed environment that is unobserved by its human teammate. We consider such a setting in which the agent can, while acting, transmit declarative information to the human that helps them understand aspects of this unseen environment. Importantly, we should expect the human to have preferences about what information they are given and when they are given it. In this work, we adopt an information-theoretic view of the human's preferences: the human scores a piece of information as a function of the induced reduction in weighted entropy of their belief about the environment state. We formulate this setting as a POMDP and give a practical algorithm for solving it approximately. Then, we give an algorithm that allows the agent to sample-efficiently learn the human's preferences online. Finally, we describe an extension in which the human's preferences are time-varying. We validate our approach experimentally in two planning domains: a 2D robot mining task and a more realistic 3D robot fetching task.
Guiding Search in Continuous State-Action Spaces by Learning an Action Sampler From Off-Target Search Experience
Kim, Beomjoon (Massachusetts Institute of Technology) | Kaelbling, Leslie Pack (Massachusetts Institute of Technology) | Lozano-Pérez, Tomás (Massachusetts Institute of Technology)
In robotics, it is essential to be able to plan efficiently in high-dimensional continuous state-action spaces for long horizons. For such complex planning problems, unguided uniform sampling of actions until a path to a goal is found is hopelessly inefficient, and gradient-based approaches often fall short when the optimization manifold of a given problem is not smooth. In this paper, we present an approach that guides search in continuous spaces for generic planners by learning an action sampler from past search experience. We use a Generative Adversarial Network (GAN) to represent an action sampler, and address an important issue: search experience consists of a relatively large number of actions that are not on a solution path and a relatively small number of actions that actually are on a solution path. We introduce a new technique, based on an importance-ratio estimation method, for using samples from a non-target distribution to make GAN learning more data-efficient. We provide theoretical guarantees and empirical evaluation in three challenging continuous robot planning problems to illustrate the effectiveness of our algorithm.
STRIPS Planning in Infinite Domains
Garrett, Caelan Reed, Lozano-Pérez, Tomás, Kaelbling, Leslie Pack
Many robotic planning applications involve continuous actions with highly non-linear constraints, which cannot be modeled using modern planners that construct a propositional representation. We introduce STRIPStream: an extension of the STRIPS language which can model these domains by supporting the specification of blackbox generators to handle complex constraints. The outputs of these generators interact with actions through possibly infinite streams of objects and static predicates. We provide two algorithms which both reduce STRIPStream problems to a sequence of finite-domain planning problems. The representation and algorithms are entirely domain independent. We demonstrate our framework on simple illustrative domains, and then on a high-dimensional, continuous robotic task and motion planning domain.
Focused Model-Learning and Planning for Non-Gaussian Continuous State-Action Systems
Wang, Zi, Jegelka, Stefanie, Kaelbling, Leslie Pack, Lozano-Pérez, Tomás
We introduce a framework for model learning and planning in stochastic domains with continuous state and action spaces and non-Gaussian transition models. It is efficient because (1) local models are estimated only when the planner requires them; (2) the planner focuses on the most relevant states to the current planning problem; and (3) the planner focuses on the most informative and/or high-value actions. Our theoretical analysis shows the validity and asymptotic optimality of the proposed approach. Empirically, we demonstrate the effectiveness of our algorithm on a simulated multi-modal pushing problem.
Bayesian Optimization with Exponential Convergence
Kawaguchi, Kenji, Kaelbling, Leslie Pack, Lozano-Pérez, Tomás
This paper presents a Bayesian optimization method with exponential convergence without the need of auxiliary optimization and without the delta-cover sampling. Most Bayesian optimization methods require auxiliary optimization: an additional non-convex global optimization problem, which can be time-consuming and hard to implement in practice. Also, the existing Bayesian optimization method with exponential convergence requires access to the delta-cover sampling, which was considered to be impractical. Our approach eliminates both requirements and achieves an exponential convergence rate.
Bayesian Optimization with Exponential Convergence
Kawaguchi, Kenji, Kaelbling, Leslie Pack, Lozano-Pérez, Tomás
This paper presents a Bayesian optimization method with exponential convergence without the need of auxiliary optimization and without the delta-cover sampling. Most Bayesian optimization methods require auxiliary optimization: an additional non-convex global optimization problem, which can be time-consuming and hard to implement in practice. Also, the existing Bayesian optimization method with exponential convergence requires access to the delta-cover sampling, which was considered to be impractical. Our approach eliminates both requirements and achieves an exponential convergence rate.
POMCoP: Belief Space Planning for Sidekicks in Cooperative Games
Macindoe, Owen (Massachusetts Institute of Technology) | Kaelbling, Leslie Pack (Massachusetts Institute of Technology) | Lozano-Pérez, Tomás (Massachusetts Institute of Technology)
We present POMCoP, a system for online planning in collaborative domains that reasons about how its actions will affect its understanding of human intentions, and demonstrate its use in building sidekicks for cooperative games. POMCoP plans in belief space. It explicitly represents its uncertainty about the intentions of its human ally, and plans actions which reveal those intentions or hedge against its uncertainty. This allows POMCoP to reason about the usefulness of incorporating information gathering actions into its plans, such as asking questions, or simply waiting to let humans reveal their intentions. We demonstrate POMCoP by constructing a sidekick for a cooperative pursuit game, and evaluate its effectiveness relative to MDP-based techniques that plan in state space, rather than belief space.