This project combines recent advances in experience replay techniques, namely, Combined Experience Replay (CER), Prioritized Experience Replay (PER), and Hindsight Experience Replay (HER). We show the results of combinations of these techniques with DDPG and DQN methods. CER always adds the most recent experience to the batch. PER chooses which experiences should be replayed based on how beneficial they will be towards learning. HER learns from failure by substituting the desired goal with the achieved goal and recomputing the reward function. The effectiveness of combinations of these experience replay techniques is tested in a variety of OpenAI gym environments.
In off-policy deep reinforcement learning, it is usually hard to collect sufficient successful experiences with sparse rewards to learn from. Hindsight experience replay (HER) enables an agent to learn from failures by treating the achieved state of a failed experience as a pseudo goal. However, not all the failed experiences are equally useful to different learning stages, so it is not efficient to replay all of them or uniform samples of them. In this paper, we propose to 1) adaptively select the failed experiences for replay according to the proximity to the true goals and the curiosity of exploration over diverse pseudo goals, and 2) gradually change the proportion of the goal-proximity and the diversity-based curiosity in the selection criteria: we adopt a human-like learning strategy that enforces more curiosity in earlier stages and changes to larger goal-proximity later. This Goal-and-Curiosity-driven Curriculum Learning'' leads to Curriculum-guided HER (CHER)'', which adaptively and dynamically controls the exploration-exploitation trade-off during the learning process via hindsight experience selection.
Reinforcement Learning algorithms typically require millions of environment interactions to learn successful policies in sparse reward settings. Hindsight Experience Replay (HER) was introduced as a technique to increase sample efficiency through re-imagining unsuccessful trajectories as successful ones by replacing the originally intended goals. However, this method is not applicable to visual domains where the goal configuration is unknown and must be inferred from observation. In this work, we show how unsuccessful visual trajectories can be hallucinated to be successful using a generative model trained on relatively few snapshots of the goal. As far as we are aware, this is the first work that does so with the agent policy conditioned solely on its state. We then apply this model to training reinforcement learning agents in discrete and continuous settings. We show results on a navigation and pick-and-place task in a 3D environment and on a simulated robotics application. Our method shows marked improvement over standard RL algorithms and baselines derived from prior work.
Sparse reward problems are one of the biggest challenges in Reinforcement Learning. Goal-directed tasks are one such sparse reward problems where a reward signal is received only when the goal is reached. One promising way to train an agent to perform goal-directed tasks is to use Hindsight Learning approaches. In these approaches, even when an agent fails to reach the desired goal, the agent learns to reach the goal it achieved instead. Doing this over multiple trajectories while generalizing the policy learned from the achieved goals, the agent learns a goal conditioned policy to reach any goal. One such approach is Hindsight Experience replay which uses an off-policy Reinforcement Learning algorithm to learn a goal conditioned policy. In this approach, a replay of the past transitions happens in a uniformly random fashion. Another approach is to use a Hindsight version of the policy gradients to directly learn a policy. In this work, we discuss different ways to replay past transitions to improve learning in hindsight experience replay focusing on prioritized variants in particular. Also, we implement the Hindsight Policy gradient methods to robotic tasks.
Efficient learning in the environment with sparse rewards is one of the most important challenges in Deep Reinforcement Learning (DRL). In continuous DRL environments such as robotic arms control, Hindsight Experience Replay (HER) has been shown an effective solution. However, due to the brittleness of deterministic methods, HER and its variants typically suffer from a major challenge for stability and convergence, which significantly affects the final performance. This challenge severely limits the applicability of such methods to complex real-world domains. To tackle this challenge, in this paper, we propose Soft Hindsight Experience Replay (SHER), a novel approach based on HER and Maximum Entropy Reinforcement Learning (MERL), combining the failed experiences reuse and maximum entropy probabilistic inference model. We evaluate SHER on Open AI Robotic manipulation tasks with sparse rewards. Experimental results show that, in contrast to HER and its variants, our proposed SHER achieves state-of-the-art performance, especially in the difficult HandManipulation tasks. Furthermore, our SHER method is more stable, achieving very similar performance across different random seeds.