Our goal is to pre-train reinforcement learning models on a diverse dataset and then transfer knowledge (either zero-shot or with fine-tuning) to a different test environment. In the last decade, we've seen learning-based systems provide transformative solutions for a wide range of perception and reasoning problems, from recognizing objects in images to recognizing and translating human speech. If fruitful, this line of work could allow learning-based systems to tackle active control tasks, such as robotics and autonomous driving, alongside the passive perception tasks to which they have already been successfully applied. While deep reinforcement learning methods – like Soft Actor Critic– can learn impressive motor skills, they are challenging to train on large and broad data that is not from the target environment. In contrast, the success of deep networks in fields like computer vision was arguably predicated just as much on large datasets, such as ImageNet, as on large neural network architectures.
Model-free deep reinforcement learning (RL) has demonstrated its superiority on many complex sequential decision-making problems. However, heavy dependence on dense rewards and high sample-complexity impedes the wide adoption of these methods in real-world scenarios. On the other hand, imitation learning (IL) learns effectively in sparse-rewarded tasks by leveraging the existing expert demonstrations. In practice, collecting a sufficient amount of expert demonstrations can be prohibitively expensive, and the quality of demonstrations typically limits the performance of the learning policy. In this work, we propose Self-Adaptive Imitation Learning (SAIL) that can achieve (near) optimal performance given only a limited number of sub-optimal demonstrations for highly challenging sparse reward tasks. SAIL bridges the advantages of IL and RL to reduce the sample complexity substantially, by effectively exploiting sup-optimal demonstrations and efficiently exploring the environment to surpass the demonstrated performance. Extensive empirical results show that not only does SAIL significantly improve the sample-efficiency but also leads to much better final performance across different continuous control tasks, comparing to the state-of-the-art.
This paper explores the problem of task learning and planning, contributing the Action-Category Representation (ACR) to improve computational performance of both Planning and Reinforcement Learning (RL). ACR is an algorithm-agnostic, abstract data representation that maps objects to action categories (groups of actions), inspired by the psychological concept of action codes. We validate our approach in StarCraft and Lightworld domains; our results demonstrate several benefits of ACR relating to improved computational performance of planning and RL, by reducing the action space for the agent.
Learning from offline task demonstrations is a problem of great interest in robotics. For simple short-horizon manipulation tasks with modest variation in task instances, offline learning from a small set of demonstrations can produce controllers that successfully solve the task. However, leveraging a fixed batch of data can be problematic for larger datasets and longer-horizon tasks with greater variations. The data can exhibit substantial diversity and consist of suboptimal solution approaches. In this paper, we propose Implicit Reinforcement without Interaction at Scale (IRIS), a novel framework for learning from large-scale demonstration datasets. IRIS factorizes the control problem into a goal-conditioned low-level controller that imitates short demonstration sequences and a high-level goal selection mechanism that sets goals for the low-level and selectively combines parts of suboptimal solutions leading to more successful task completions. We evaluate IRIS across three datasets, including the RoboTurk Cans dataset collected by humans via crowdsourcing, and show that performant policies can be learned from purely offline learning. Additional results and videos at https://stanfordvl.github.io/iris/ .
Due to the realization that deep reinforcement learning algorithms trained on high-dimensional tasks can strongly overfit to their training environments, there have been several studies that investigated the generalization performance of these algorithms. However, there has been no similar study that evaluated the generalization performance of algorithms that were specifically designed for generalization, i.e. meta-reinforcement learning algorithms. In this paper, we assess the generalization performance of these algorithms by leveraging high-dimensional, procedurally generated environments. We find that these algorithms can display strong overfitting when they are evaluated on challenging tasks. We also observe that scalability to high-dimensional tasks with sparse rewards remains a significant problem among many of the current meta-reinforcement learning algorithms. With these results, we highlight the need for developing meta-reinforcement learning algorithms that can both generalize and scale.