Goto

Collaborating Authors

 Platt, Robert


Belief-Grounded Networks for Accelerated Robot Learning under Partial Observability

arXiv.org Artificial Intelligence

Many important robotics problems are partially observable in the sense that a single visual or force-feedback measurement is insufficient to reconstruct the state. Standard approaches involve learning a policy over beliefs or observation-action histories. However, both of these have drawbacks; it is expensive to track the belief online, and it is hard to learn policies directly over histories. We propose a method for policy learning under partial observability called the Belief-Grounded Network (BGN) in which an auxiliary belief-reconstruction loss incentivizes a neural network to concisely summarize its input history. Since the resulting policy is a function of the history rather than the belief, it can be executed easily at runtime. We compare BGN against several baselines on classic benchmark tasks as well as three novel robotic touch-sensing tasks. BGN outperforms all other tested methods and its learned policies work well when transferred onto a physical robot.


Learning visual servo policies via planner cloning

arXiv.org Artificial Intelligence

This algorithm differs from Visual servoing in novel environments is an important AGGREVATE because problem. Given images produced by a camera, a visual servo it incorporates the value control policy guides a grasped part into a desired pose penalties and from DQfD relative to the environment. This problem appears in many because it uses supervised situations: reaching, grasping, peg insertion, stacking, machine targets rather than TD assembly tasks, etc. Whereas classical approaches to the targets. We compare PQC problem [6, 3, 27] typically make strong assumptions about the with several baselines and environment (fiducials, known object geometries, etc.), there algorithm ablations and has been a surge of interest recently in using deep learning show that it outperforms methods to solve these problems in more unstructured settings all these variations on two that incorporate novel objects [29, 14, 26, 8, 21, 28, 12, 13].


Learning Multi-Level Hierarchies with Hindsight

arXiv.org Artificial Intelligence

Multi-level hierarchies have the potential to accelerate learning in sparse reward tasks because they can divide a problem into a set of short horizon subproblems. In order to realize this potential, Hierarchical Reinforcement Learning (HRL) algorithms need to be able to learn the multiple levels within a hierarchy in parallel, so these simpler subproblems can be solved simultaneously. Yet most existing HRL methods that can learn hierarchies are not able to efficiently learn multiple levels of policies at the same time, particularly in continuous domains. To address this problem, we introduce a framework that can learn multiple levels of policies in parallel. Our approach consists of two main components: (i) a particular hierarchical architecture and (ii) a method for jointly learning multiple levels of policies. The hierarchies produced by our framework are comprised of a set of nested, goal-conditioned policies that use the state space to decompose a task into short subtasks. All policies in the hierarchy are learned simultaneously using two types of hindsight transitions. We demonstrate experimentally in both grid world and simulated robotics domains that our approach can significantly accelerate learning relative to other non-hierarchical and hierarchical methods. Indeed, our framework is the first to successfully learn 3-level hierarchies in parallel in tasks with continuous state and action spaces.


Online abstraction with MDP homomorphisms for Deep Learning

arXiv.org Machine Learning

Abstraction of Markov Decision Processes is a useful tool for solving complex problems, as it can ignore unimportant aspects of an environment, simplifying the process of learning an optimal policy. In this paper, we propose a new algorithm for finding abstract MDPs in environments with continuous state spaces. It is based on MDP homomorphisms, a structure-preserving mapping between MDPs. We demonstrate our algorithm's ability to learns abstractions from collected experience and show how to reuse the abstractions to guide exploration in new tasks the agent encounters. Our novel task transfer method beats a baseline based on a deep Q-network.


Adapting control policies from simulation to reality using a pairwise loss

arXiv.org Artificial Intelligence

This paper proposes an approach to domain transfer based on a pairwise loss function that helps transfer control policies learned in simulation onto a real robot. We explore the idea in the context of a 'category level' manipulation task where a control policy is learned that enables a robot to perform a mating task involving novel objects. We explore the case where depth images are used as the main form of sensor input. Our experimental results demonstrate that proposed method consistently outperforms baseline methods that train only in simulation or that combine real and simulated data in a naive way.


Hierarchical Reinforcement Learning with Hindsight

arXiv.org Artificial Intelligence

Reinforcement Learning (RL) algorithms can suffer from poor sample efficiency when rewards are delayed and sparse. We introduce a solution that enables agents to learn temporally extended actions at multiple levels of abstraction in a sample efficient and automated fashion. Our approach combines universal value functions and hindsight learning, allowing agents to learn policies belonging to different time scales in parallel. We show that our method significantly accelerates learning in a variety of discrete and continuous tasks.


Coarticulation in Markov Decision Processes

Neural Information Processing Systems

We investigate an approach for simultaneously committing to multiple activities,each modeled as a temporally extended action in a semi-Markov decision process (SMDP). For each activity we define aset of admissible solutions consisting of the redundant set of optimal policies, and those policies that ascend the optimal statevalue functionassociated with them. A plan is then generated by merging them in such a way that the solutions to the subordinate activities are realized in the set of admissible solutions satisfying the superior activities.