space noise
Learning Agents With Prioritization and Parameter Noise in Continuous State and Action Space
Mangannavar, Rajesh, Srinivasaraghavan, Gopalakrishnan
Among the many variants of RL, an important class of problems is where the state and action spaces are continuous -- autonomous robots, autonomous vehicles, optimal control are all examples of such problems that can lend themselves naturally to reinforcement based algorithms, and have continuous state and action spaces. In this paper, we introduce a prioritized form of a combination of state-of-the-art approaches such as Deep Q-learning (DQN) and Deep Deterministic Policy Gradient (DDPG) to outperform the earlier results for continuous state and action space problems. Our experiments also involve the use of parameter noise during training resulting in more robust deep RL models outperforming the earlier results significantly. We believe these results are a valuable addition for continuous state and action space problems.
Reviews: Randomized Prior Functions for Deep Reinforcement Learning
Summary: This paper studies RL exploration based on uncertainty. First, they compare several previously published RL exploration methods and identifying their drawbacks (including illustrative toy experiments). Then, they extend a particular previous method, bootstrapped DQN [1] (which uses bootstrap uncertainty estimates), through the addition of random prior functions. This extension is motivated from Bayesian linear regression, and transferred to the case of deep non-linear neural networks. Experimental results on the Chain, CartPole swing-up and Montezuma Revenge show improved performance over a previous baseline, the bootstrapped DQN method.
Switching Isotropic and Directional Exploration with Parameter Space Noise in Deep Reinforcement Learning
Karino, Izumi, Tanaka, Kazutoshi, Niiyama, Ryuma, Kuniyoshi, Yasuo
This paper proposes an exploration method for deep reinforcement learning based on parameter space noise. Recent studies have experimentally shown that parameter space noise results in better exploration than the commonly used action space noise. Previous methods devised a way to update the diagonal covariance matrix of a noise distribution and did not consider the direction of the noise vector and its correlation. In addition, fast updates of the noise distribution are required to facilitate policy learning. We propose a method that deforms the noise distribution according to the accumulated returns and the noises that have led to the returns. Moreover, this method switches isotropic exploration and directional exploration in parameter space with regard to obtained rewards. We validate our exploration strategy in the OpenAI Gym continuous environments and modified environments with sparse rewards. The proposed method achieves results that are competitive with a previous method at baseline tasks. Moreover, our approach exhibits better performance in sparse reward environments by exploration with the switching strategy.
Parameter Space Noise for Exploration
Plappert, Matthias, Houthooft, Rein, Dhariwal, Prafulla, Sidor, Szymon, Chen, Richard Y., Chen, Xi, Asfour, Tamim, Abbeel, Pieter, Andrychowicz, Marcin
Deep reinforcement learning (RL) methods generally engage in exploratory behavior through noise injection in the action space. An alternative is to add noise directly to the agent's parameters, which can lead to more consistent exploration and a richer set of behaviors. Methods such as evolutionary strategies use parameter perturbations, but discard all temporal structure in the process and require significantly more samples. Combining parameter noise with traditional RL methods allows to combine the best of both worlds. We demonstrate that both off- and on-policy methods benefit from this approach through experimental comparison of DQN, DDPG, and TRPO on high-dimensional discrete action environments as well as continuous control tasks. Our results show that RL with parameter noise learns more efficiently than traditional RL with action space noise and evolutionary strategies individually.
Better Exploration with Parameter Noise
Parameter noise helps algorithms more efficiently explore the range of actions available to solve an environment. After 216 episodes of training DDPG without parameter noise will frequently develop inefficient running behaviors, whereas policies trained with parameter noise often develop a high-scoring gallop. Parameter noise lets us teach agents tasks much more rapidly than with other approaches. After learning for 20 episodes on the HalfCheetah Gym environment (shown above), the policy achieves a score of around 3,000, whereas a policy trained with traditional action noise only achieves around 1,500. Parameter noise adds adaptive noise to the parameters of the neural network policy, rather than to its action space. Traditional RL uses action space noise to change the likelihoods associated with each action the agent might take from one moment to the next.