Augment-Reinforce-Merge Policy Gradient for Binary Stochastic Policy

Tang, Yunhao, Yin, Mingzhang, Zhou, Mingyuan

Mar-12-2019–arXiv.org Artificial Intelligence

Due to the high variance of policy gradients, on-policy optimization algorithms are plagued with low sample efficiency. In this work, we propose Augment-Reinforce-Merge (ARM) policy gradient estimator as an unbiased low-variance alternative to previous baseline estimators on tasks with binary action space, inspired by the recent ARM gradient estimator for discrete random variable models. We show that the ARM policy gradient estimator achieves variance reduction with theoretical guarantees, and leads to significantly more stable and faster convergence of policies parameterized by neural networks.

artificial intelligence, estimator, machine learning, (13 more...)

arXiv.org Artificial Intelligence

Mar-12-2019

arXiv.org PDF

Add feedback

Country:
- North America > United States > Texas > Travis County > Austin (0.04)

Genre:
- Workflow (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Optimization (0.88)
  - Machine Learning
    - Neural Networks (1.00)
    - Statistical Learning (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found