Roy, Nicholas
Batch-iFDD for Representation Expansion in Large MDPs
Geramifard, Alborz, Walsh, Thomas J., Roy, Nicholas, How, Jonathan
Matching pursuit (MP) methods are a promising class of feature construction algorithms for value function approximation. Yet existing MP methods require creating a pool of potential features, mandating expert knowledge or enumeration of a large feature pool, both of which hinder scalability. This paper introduces batch incremental feature dependency discovery (Batch-iFDD) as an MP method that inherits a provable convergence property. Additionally, Batch-iFDD does not require a large pool of features, leading to lower computational complexity. Empirical policy evaluation results across three domains with up to one million states highlight the scalability of Batch-iFDD over the previous state of the art MP algorithm.
CORL: A Continuous-state Offset-dynamics Reinforcement Learner
Brunskill, Emma, Leffler, Bethany, Li, Lihong, Littman, Michael L., Roy, Nicholas
Continuous state spaces and stochastic, switching dynamics characterize a number of rich, realworld domains, such as robot navigation across varying terrain. We describe a reinforcementlearning algorithm for learning in these domains and prove for certain environments the algorithm is probably approximately correct with a sample complexity that scales polynomially with the state-space dimension. Unfortunately, no optimal planning techniques exist in general for such problems; instead we use fitted value iteration to solve the learned MDP, and include the error due to approximate planning in our bounds. Finally, we report an experiment using a robotic car driving over varying terrain to demonstrate that these dynamics representations adequately capture real-world dynamics and that our algorithm can be used to efficiently solve such problems.
Approaching the Symbol Grounding Problem with Probabilistic Graphical Models
Tellex, Stefanie (Massachusetts Institute of Technology) | Kollar, Thomas (Massachusetts Institute of Technology) | Dickerson, Steven (Massachusetts Institute of Technology) | Walter, Matthew R. (Massachusetts Institute of Technology) | Banerjee, Ashis Gopal (Massachusetts Institute of Technology) | Teller, Seth (Massachusetts Institute of Technology) | Roy, Nicholas (Massachusetts Institute of Technology)
A solution to this symbol grounding problem (Harnad, 1990) would enable a robot to interpret commands such as "Drive over to receiving and pick up the tire pallet." In this article we describe several of our results that use probabilistic inference to address the symbol grounding problem. Our specific approach is to develop models that factor according to the linguistic structure of a command. We first describe an early result, a generative model that factors according to the sequential structure of language, and then discuss our new framework, generalized grounding graphs (G3).
Approaching the Symbol Grounding Problem with Probabilistic Graphical Models
Tellex, Stefanie (Massachusetts Institute of Technology) | Kollar, Thomas (Massachusetts Institute of Technology) | Dickerson, Steven (Massachusetts Institute of Technology) | Walter, Matthew R. (Massachusetts Institute of Technology) | Banerjee, Ashis Gopal (Massachusetts Institute of Technology) | Teller, Seth (Massachusetts Institute of Technology) | Roy, Nicholas (Massachusetts Institute of Technology)
n order for robots to engage in dialog with human teammates, they must have the ability to map between words in the language and aspects of the external world. A solution to this symbol grounding problem (Harnad, 1990) would enable a robot to interpret commands such as “Drive over to receiving and pick up the tire pallet.” In this article we describe several of our results that use probabilistic inference to address the symbol grounding problem. Our specific approach is to develop models that factor according to the linguistic structure of a command. We first describe an early result, a generative model that factors according to the sequential structure of language, and then discuss our new framework, generalized grounding graphs (G3). The G3 framework dynamically instantiates a probabilistic graphical model for a natural language input, enabling a mapping between words in language and concrete objects, places, paths and events in the external world. We report on corpus-based experiments where the robot is able to learn and use word meanings in three real-world tasks: indoor navigation, spatial language video retrieval, and mobile manipulation.
Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation
Tellex, Stefanie (Massachusetts Institute of Technology) | Kollar, Thomas (Massachusetts Institute of Technology) | Dickerson, Steven (Massachusetts Institute of Technology) | Walter, Matthew R. (Massachusetts Institute of Technology) | Banerjee, Ashis Gopal (Massachusetts Institute of Technology) | Teller, Seth (Massachusetts Institute of Technology) | Roy, Nicholas (Massachusetts Institute of Technology)
This paper describes a new model for understanding natural language commands given to autonomous systems that perform navigation and mobile manipulation in semi-structured environments. Previous approaches have used models with fixed structure to infer the likelihood of a sequence of actions given the environment and the command. In contrast, our framework, called Generalized Grounding Graphs, dynamically instantiates a probabilistic graphical model for a particular natural language command according to the command's hierarchical and compositional semantic structure. Our system performs inference in the model to successfully find and execute plans corresponding to natural language commands such as "Put the tire pallet on the truck." The model is trained using a corpus of commands collected using crowdsourcing. We pair each command with robot actions and use the corpus to learn the parameters of the model. We evaluate the robot's performance by inferring plans from natural language commands, executing each plan in a realistic robot simulator, and asking users to evaluate the system's performance. We demonstrate that our system can successfully follow many natural language commands from the corpus.
Planning to Perceive: Exploiting Mobility for Robust Object Detection
Velez, Javier (Massachusetts Institute of Technology) | Hemann, Garrett (Massachusetts Institute of Technology) | Huang, Albert S. (Massachusetts Institute of Technology) | Posner, Ingmar (Department of Engineering Science, University of Oxford) | Roy, Nicholas (Massachusetts Institute of Technology)
Consider the task of a mobile robot autonomously navigating through an environment while detecting and mapping objects of interest using a noisy object detector. The robot must reach its destination in a timely manner, but is rewarded for correctly detecting recognizable objects to be added to the map, and penalized for false alarms. However, detector performance typically varies with vantage point, so the robot benefits from planning trajectories which maximize the efficacy of the recognition system. This work describes an online, any-time planning framework enabling the active exploration of possible detections provided by an off-the-shelf object detector. We present a probabilistic approach where vantage points are identified which provide a more informative view of a potential object. The agent then weighs the benefit of increasing its confidence against the cost of taking a detour to reach each identified vantage point. The system is demonstrated to significantly improve detection and trajectory length in both simulated and real robot experiments.
Nonparametric Bayesian Policy Priors for Reinforcement Learning
Doshi-velez, Finale, Wingate, David, Roy, Nicholas, Tenenbaum, Joshua B.
We consider reinforcement learning in partially observable domains where the agent can query an expert for demonstrations. Our nonparametric Bayesian approach combines model knowledge, inferred from expert information and independent exploration, with policy knowledge inferred from expert trajectories. We introduce priors that bias the agent towards models with both simple representations and simple policies, resulting in improved policy and model learning.
A Discriminative Model for Understanding Natural Language Route Directions
Kollar, Thomas (Massachusetts Institute of Technology) | Tellex, Stefanie (Massachusetts Institute of Technology) | Roy, Nicholas (Massachusetts Institute of Technology)
To be useful teammates to human partners, robots must be able to follow spoken instructions given in natural language. However, determining the correct sequence of actions in response to a set of spoken instructions is a complex decision-making problem. There is a "semantic gap" between the high-level symbolic models of the world that people use, and the low-level models of geometry, state dynamics, and perceptions that robots use. In this paper, we show how this gap can be bridged by inferring the best sequence of actions from a linguistic description and environmental features. This work improves upon previous work in three ways. First, by using a conditional random field (CRF), we learn the relative weight of environmental and linguistic features, enabling the system to learn the meanings of words and reducing the modeling effort in learning how to follow commands. Second, a number of long-range features are added, which help the system to use additional structure in the problem. Finally, given a natural language command, we infer both the referred path and landmark directly, thereby requiring the algorithm to pick a landmark by which it should navigate. The CRF is demonstrated to have 15% error on a held-out dataset, when compared with 39% error for a Markov random field (MRF). Finally, by analyzing the additional annotations necessary for this work, we find that natural language route directions map sequentially onto the corresponding path and landmarks 99.6% of the time. In addition, the size of the referred landmark varies from 0m 2 to 1964m 2 and the length of the referred path varies from 0 m to 40.83 m .
PUMA: Planning Under Uncertainty with Macro-Actions
He, Ruijie (Massachusetts Institute of Technology) | Brunskill, Emma (University of California, Berkeley) | Roy, Nicholas (Massachusetts Institute of Technology)
Planning in large, partially observable domains is challenging, especially when a long-horizon lookahead is necessary to obtain a good policy. Traditional POMDP planners that plan a different potential action for each future observation can be prohibitively expensive when planning many steps ahead. An efficient solution for planning far into the future in fully observable domains is to use temporally-extended sequences of actions, or "macro-actions." In this paper, we present a POMDP algorithm for planning under uncertainty with macro-actions (PUMA) that automatically constructs and evaluates open-loop macro-actions within forward-search planning, where the planner branches on observations only at the end of each macro-action. Additionally, we show how to incrementally refine the plan over time, resulting in an anytime algorithm that provably converges to an epsilon-optimal policy. In experiments on several large POMDP problems which require a long horizon lookahead, PUMA outperforms existing state-of-the art solvers.
A Bayesian Nonparametric Approach to Modeling Mobility Patterns
Joseph, Joshua Mason (Massachusetts Institute of Technology) | Doshi-Velez, Finale (Massachusetts Institute of Technology) | Roy, Nicholas (Massachusetts Institute of Technology)
Constructing models of mobile agents can be difficult without domain-specific knowledge. Parametric models flexible enough to capture all mobility patterns that an expert believes are possible are often large, requiring a great deal of training data. In contrast, nonparametric models are extremely flexible and can generalize well with relatively little training data. We propose modeling the mobility patterns of moving agents as a mixture of Gaussian processes (GP) with a Dirichlet process (DP) prior over mixture weights. The GP provides a flexible representation for each individual mobility pattern, while the DP assigns observed trajectories to particular mobility patterns. Both the GPs and the DP adjust the model's complexity based on available data, implicitly avoiding issues of over-fitting or under-fitting. We apply our model to a helicopter-based tracking task, where the mobility patterns of the tracked agents — cars — are learned from real data collected from taxis in the greater Boston area.