Reward Estimation for Variance Reduction in Deep Reinforcement Learning

Romoff, Joshua, Piché, Alexandre, Henderson, Peter, Francois-Lavet, Vincent, Pineau, Joelle

May-8-2018–arXiv.org Artificial Intelligence

In reinforcement learning (RL), stochastic environments can make learning a policy difficult due to high degrees of variance. As such, variance reduction methods have been investigated in other works, such as advantage estimation and control-variates estimation. Here, we propose to learn a separate reward estimator to train the value function, to help reduce variance caused by a noisy reward signal. This results in theoretical reductions in variance in the tabular case, as well as empirical improvements in both the function approximation and tabular settings in environments where rewards are stochastic. To do so, we use a modified version of Advantage Actor Critic (A2C) on variations of Atari games.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

May-8-2018

arXiv.org PDF

Add feedback

Country:
- North America
  - United States > New Jersey
    - Mercer County > Princeton (0.04)
  - Canada > Quebec
    - Montreal (0.15)

Genre:
- Research Report (0.40)

Industry:
- Leisure & Entertainment > Games > Computer Games (0.56)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found