Smoothed Action Value Functions for Learning Gaussian Policies

Nachum, Ofir, Norouzi, Mohammad, Tucker, George, Schuurmans, Dale

Jun-11-2018–arXiv.org Artificial Intelligence

State-action value functions (i.e., Q-values) are ubiquitous in reinforcement learning (RL), giving rise to popular algorithms such as SARSA and Q-learning. We propose a new notion of action value defined by a Gaussian smoothed version of the expected Q-value. We show that such smoothed Q-values still satisfy a Bellman equation, making them learnable from experience sampled from an environment. Moreover, the gradients of expected reward with respect to the mean and covariance of a parameterized Gaussian policy can be recovered from the gradient and Hessian of the smoothed Q-value function. Based on these relationships, we develop new algorithms for training a Gaussian policy directly from a learned smoothed Q-value approximator. The approach is additionally amenable to proximal optimization by augmenting the objective with a penalty on KL-divergence from a previous policy. We find that the ability to learn both a mean and covariance during training leads to significantly improved results on standard continuous control benchmarks.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

Jun-11-2018

arXiv.org PDF

Add feedback

Country:
- North America > Canada
  - Alberta (0.14)
- Europe
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.14)
  - Sweden > Stockholm
    - Stockholm (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning
    - Reinforcement Learning (1.00)
    - Neural Networks (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found