Entropy-Augmented Entropy-Regularized Reinforcement Learning and a Continuous Path from Policy Gradient to Q-Learning

Jun-5-2020–arXiv.org Machine Learning

Entropy augmented to reward is known to soften the greedy argmax policy to softmax policy. Entropy augmentation is reformulated and leads to a motivation to introduce an additional entropy term to the objective function in the form of KL-divergence to regularize optimization process. It results in a policy which monotonically improves while interpolating from the current policy to the softmax greedy policy. This policy is used to build a continuously parameterized algorithm which optimize policy and Q-function simultaneously and whose extreme limits correspond to policy gradient and Q-learning, respectively. Experiments show that there can be a performance gain using an intermediate algorithm. Both Q-learning[15] and policy gradient(PG)[13] update policy towards greedy one whether the policy is explicit or not.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

Jun-5-2020

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - New South Wales > Sydney (0.04)
- North America
  - United States
    - Illinois > Cook County
      - Chicago (0.04)
    - Colorado > Denver County
      - Denver (0.04)
    - California > San Francisco County
      - San Francisco (0.14)
  - Canada > British Columbia
    - Metro Vancouver Regional District > Vancouver (0.04)
- Europe > France
  - Hauts-de-France > Nord > Lille (0.04)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found