Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics

Kuznetsov, Arsenii, Shvechikov, Pavel, Grishin, Alexander, Vetrov, Dmitry

May-8-2020–arXiv.org Artificial Intelligence

The overestimation bias is one of the major impediments to accurate off-policy learning. This paper investigates a novel way to alleviate the overestimation bias in a continuous control setting. Our method---Truncated Quantile Critics, TQC,---blends three ideas: distributional representation of a critic, truncation of critics prediction, and ensembling of multiple critics. Distributional representation and truncation allow for arbitrary granular overestimation control, while ensembling provides additional score improvements. TQC outperforms the current state of the art on all environments from the continuous control benchmark suite, demonstrating 25% improvement on the most challenging Humanoid environment.

controlling overestimation bias, machine learning, reinforcement learning, (11 more...)

arXiv.org Artificial Intelligence

May-8-2020

arXiv.org PDF

Add feedback

Country:
- Asia > Russia (0.04)
- North America > United States
  - California > Los Angeles County > Long Beach (0.04)
- Europe
  - Sweden > Stockholm
    - Stockholm (0.04)
  - Russia > Central Federal District
    - Moscow Oblast > Moscow (0.04)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (0.94)
  - Machine Learning
    - Reinforcement Learning (0.99)
    - Neural Networks (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found