Policy Gradient using Weak Derivatives for Reinforcement Learning

Bhatt, Sujay, Koppel, Alec, Krishnamurthy, Vikram

Apr-9-2020–arXiv.org Machine Learning

This paper considers policy search in continuous state-action reinforcement learning problems. Typically, one computes search directions using a classic expression for the policy gradient called the Policy Gradient Theorem, which decomposes the gradient of the value function into two factors: the score function and the Q-function. This paper presents four results:(i) an alternative policy gradient theorem using weak (measure-valued) derivatives instead of score-function is established; (ii) the stochastic gradient estimates thus derived are shown to be unbiased and to yield algorithms that converge almost surely to stationary points of the non-convex value function of the reinforcement learning problem; (iii) the sample complexity of the algorithm is derived and is shown to be $O(1/\sqrt(k))$; (iv) finally, the expected variance of the gradient estimates obtained using weak derivatives is shown to be lower than those obtained using the popular score-function approach. Experiments on OpenAI gym pendulum environment show superior performance of the proposed algorithm.

algorithm, decomposition, gradient, (16 more...)

arXiv.org Machine Learning

Apr-9-2020

arXiv.org PDF

Add feedback

Country:
- North America
  - Canada > Alberta (0.14)
  - United States
    - Pennsylvania (0.04)
    - New York (0.04)
    - New Jersey > Mercer County
      - Princeton (0.04)
- Europe
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.14)
  - Romania > Sud-Est Development Region
    - Tulcea County > Tulcea (0.04)
- Asia > Middle East
  - Jordan (0.06)

Genre:
- Research Report (0.40)

Industry:
- Education > Focused Education > Special Education (0.45)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Statistical Learning > Gradient Descent (0.37)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found