Goto

Collaborating Authors

 Reinforcement Learning


Parrot: Data-Driven Behavioral Priors for Reinforcement Learning

#artificialintelligence

Reinforcement learning provides a general framework for flexible decision making and control, but requires extensive data collection for each new task that an agent needs to learn. In other machine learning fields, such as natural language processing or computer vision, pre-training on large, previously collected datasets to bootstrap learning for new tasks has emerged as a powerful paradigm to reduce data requirements when learning a new task. In this paper, we ask the following question: how can we enable similarly useful pre-training for RL agents? We propose a method for pre-training behavioral priors that can capture complex input-output relationships observed in successful trials from a wide range of previously seen tasks, and we show how this learned prior can be used for rapidly learning new tasks without impeding the RL agent's ability to try out novel behaviors. We demonstrate the effectiveness of our approach in challenging robotic manipulation domains involving image observations and sparse reward functions, where our method outperforms prior works by a substantial margin.


Tonic: A Deep Reinforcement Learning Library for Fast Prototyping and Benchmarking

#artificialintelligence

Distributed training has been shown to greatly accelerate the training of RL agents with respect to wall clock time (Mnih et al., 2016; Espeholt et al., 2018). Instead of interacting with a single environment at a time, the agent interacts with a set of differently seeded copies of the environment to diversify experience and increase throughput. For simplicity and to ensure reproducibility, Tonic uses a synchronous training loop illustrated in Figure 3.


Robots deciding their next move need help prioritizing

#artificialintelligence

As robots replace humans in dangerous situations such as search and rescue missions, they need to be able to quickly assess and make decisions--to react and adapt like a human being would. Researchers at the University of Illinois at Urbana-Champaign used a model based on the game Capture the Flag to develop a new take on deep reinforcement learning that helps robots evaluate their next move. The team of researchers chose Capture the Flag because it's played with two teams, each with multiple teammates, where the opposing team is also making decisions. "Robots can learn how to react in an environment like a competitive game by using a kind of trial and error process, called reinforcement learning. They learn what actions to take in a given situation by playing the game," said Huy Tran, a researcher in the Department of Aerospace Engineering at UIUC. "The challenge is to figure out how to create agents that can also adapt to unexpected situations."


Learning State Abstractions for Long-Horizon Planning

#artificialintelligence

Many tasks that we do on a regular basis, such as navigating a city, cooking a meal, or loading a dishwasher, require planning over extended periods of time. Accomplishing these tasks may seem simple to us; however, reasoning over long time horizons remains a major challenge for today's Reinforcement Learning (RL) algorithms. While unable to plan over long horizons, deep RL algorithms excel at learning policies for short horizon tasks, such as robotic grasping, directly from pixels. At the same time, classical planning methods such as Dijkstra's algorithm and A$ *$ search can plan over long time horizons, but they require hand-specified or task-specific abstract representations of the environment as input. To achieve the best of both worlds, state-of-the-art visual navigation methods have applied classical search methods to learned graphs.


Double Meta-Learning for Data Efficient Policy Optimization in Non-Stationary Environments

arXiv.org Artificial Intelligence

We are interested in learning models of non-stationary environments, which can be framed as a multi-task learning problem. Model-free reinforcement learning algorithms can achieve good asymptotic performance in multi-task learning at a cost of extensive sampling, due to their approach, which requires learning from scratch. While model-based approaches are among the most data efficient learning algorithms, they still struggle with complex tasks and model uncertainties. Meta-reinforcement learning addresses the efficiency and generalization challenges on multi task learning by quickly leveraging the meta-prior policy for a new task. In this paper, we propose a meta-reinforcement learning approach to learn the dynamic model of a non-stationary environment to be used for meta-policy optimization later. Due to the sample efficiency of model-based learning methods, we are able to simultaneously train both the meta-model of the non-stationary environment and the meta-policy until dynamic model convergence. Then, the meta-learned dynamic model of the environment will generate simulated data for meta-policy optimization. Our experiment demonstrates that our proposed method can meta-learn the policy in a non-stationary environment with the data efficiency of model-based learning approaches while achieving the high asymptotic performance of model-free meta-reinforcement learning.


Revisiting Rainbow: Promoting more insightful and inclusive deep reinforcement learning research

arXiv.org Artificial Intelligence

Since the introduction of DQN, a vast majority of reinforcement learning research has focused on reinforcement learning with deep neural networks as function approximators. New methods are typically evaluated on a set of environments that have now become standard, such as Atari 2600 games. While these benchmarks help standardize evaluation, their computational cost has the unfortunate side effect of widening the gap between those with ample access to computational resources, and those without. In this work we argue that, despite the community's emphasis on large-scale environments, the traditional small-scale environments can still yield valuable scientific insights and can help reduce the barriers to entry for underprivileged communities. To substantiate our claims, we empirically revisit the paper which introduced the Rainbow algorithm [Hessel et al., 2018] and present some new insights into the algorithms used by Rainbow.


Bridging Scene Understanding and Task Execution with Flexible Simulation Environments

arXiv.org Artificial Intelligence

Significant progress has been made in scene understanding which seeks to build 3D, metric and object-oriented representations of the world. Concurrently, reinforcement learning has made impressive strides largely enabled by advances in simulation. Comparatively, there has been less focus in simulation for perception algorithms. Simulation is becoming increasingly vital as sophisticated perception approaches such as metric-semantic mapping or 3D dynamic scene graph generation require precise 3D, 2D, and inertial information in an interactive environment. To that end, we present TESSE (Task Execution with Semantic Segmentation Environments), an open source simulator for developing scene understanding and task execution algorithms. TESSE has been used to develop state-of-the-art solutions for metric-semantic mapping and 3D dynamic scene graph generation. Additionally, TESSE served as the platform for the GOSEEK Challenge at the International Conference of Robotics and Automation (ICRA) 2020, an object search competition with an emphasis on reinforcement learning. Code for TESSE is available at https://github.com/MIT-TESSE.


Graph-based State Representation for Deep Reinforcement Learning

arXiv.org Machine Learning

Deep RL approaches build much of their success on the ability of the deep neural network to generate useful internal representations. Nevertheless, they suffer from a high sample-complexity and starting with a good input representation can have a significant impact on the performance. In this paper, we exploit the fact that the underlying Markov decision process (MDP) represents a graph, which enables us to incorporate the topological information for effective state representation learning. Motivated by the recent success of node representations for several graph analytical tasks we specifically investigate the capability of node representation learning methods to effectively encode the topology of the underlying MDP in Deep RL. To this end we perform a comparative analysis of several models chosen from 4 different classes of representation learning algorithms for policy learning in grid-world navigation tasks, which are representative of a large class of RL problems. We find that all embedding methods outperform the commonly used matrix representation of grid-world environments in all of the studied cases. Moreoever, graph convolution based methods are outperformed by simpler random walk based methods and graph linear autoencoders.


Provable Multi-Objective Reinforcement Learning with Generative Models

arXiv.org Artificial Intelligence

Multi-objective reinforcement learning (MORL) is an extension of ordinary, single-objective reinforcement learning (RL) that is applicable to many real world tasks where multiple objectives exist without known relative costs. We study the problem of single policy MORL, which learns an optimal policy given the preference of objectives. Existing methods require strong assumptions such as exact knowledge of the multi-objective Markov decision process, and are analyzed in the limit of infinite data and time. We propose a new algorithm called model-based envelop value iteration (EVI), which generalizes the enveloped multi-objective $Q$-learning algorithm in Yang, 2019. Our method can learn a near-optimal value function with polynomial sample complexity and linear convergence speed. To the best of our knowledge, this is the first finite-sample analysis of MORL algorithms.


Inverse Reinforcement Learning via Matching of Optimality Profiles

arXiv.org Artificial Intelligence

The goal of inverse reinforcement learning (IRL) is to infer a reward function that explains the behavior of an agent performing a task. The assumption that most approaches make is that the demonstrated behavior is near-optimal. In many real-world scenarios, however, examples of truly optimal behavior are scarce, and it is desirable to effectively leverage sets of demonstrations of suboptimal or heterogeneous performance, which are easier to obtain. We propose an algorithm that learns a reward function from such demonstrations together with a weak supervision signal in the form of a distribution over rewards collected during the demonstrations (or, more generally, a distribution over cumulative discounted future rewards). We view such distributions, which we also refer to as optimality profiles, as summaries of the degree of optimality of the demonstrations that may, for example, reflect the opinion of a human expert. Given an optimality profile and a small amount of additional supervision, our algorithm fits a reward function, modeled as a neural network, by essentially minimizing the Wasserstein distance between the corresponding induced distribution and the optimality profile. We show that our method is capable of learning reward functions such that policies trained to optimize them outperform the demonstrations used for fitting the reward functions.