Stabilizing Off-Policy Reinforcement Learning with Conservative Policy Gradients

Tessler, Chen, Merlis, Nadav, Mannor, Shie

Oct-2-2019–arXiv.org Artificial Intelligence

In recent years, advances in deep learning have enabled the application of reinforcement learning algorithms in complex domains. However, they lack the theoretical guarantees which are present in the tabular setting and suffer from many stability and reproducibility problems \citep{henderson2018deep}. In this work, we suggest a simple approach for improving stability and providing probabilistic performance guarantees in off-policy actor-critic deep reinforcement learning regimes. Experiments on continuous action spaces, in the MuJoCo control suite, show that our proposed method reduces the variance of the process and improves the overall performance.

algorithm, arxiv preprint arxiv, evaluation, (12 more...)

arXiv.org Artificial Intelligence

Oct-2-2019

arXiv.org PDF

Add feedback

Country:
- Asia > Middle East
  - Jordan (0.04)
  - Israel > Haifa District
    - Haifa (0.04)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Neural Networks > Deep Learning (0.66)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found