ConQUR: Mitigating Delusional Bias in Deep Q-learning

Su, Andy, Ooi, Jayden, Lu, Tyler, Schuurmans, Dale, Boutilier, Craig

Feb-27-2020–arXiv.org Artificial Intelligence

Delusional bias is a fundamental source of error in approximate Q-learning. To date, the only techniques that explicitly address delusion require comprehensive search using tabular value estimates. In this paper, we develop efficient methods to mitigate delusional bias by training Q-approximators with labels that are "consistent" with the underlying greedy policy class. We introduce a simple penalization scheme that encourages Q-labels used across training batches to remain (jointly) consistent with the expressible policy class. We also propose a search framework that allows multiple Q-approximators to be generated and tracked, thus mitigating the effect of premature (implicit) policy commitments. Experimental results demonstrate that these methods can improve the performance of Q-learning in a variety of Atari games, sometimes dramatically.

assignment, mitigating delusional bias, q-learning, (13 more...)

arXiv.org Artificial Intelligence

Feb-27-2020

arXiv.org PDF

Add feedback

Country:
- North America
  - United States
    - Massachusetts > Middlesex County
      - Cambridge (0.14)
    - California > Santa Clara County
      - Mountain View (0.04)
  - Canada
    - Quebec > Montreal (0.04)
    - Ontario > Toronto (0.04)
    - British Columbia > Metro Vancouver Regional District
      - Vancouver (0.04)
    - Alberta > Census Division No. 11
      - Edmonton Metropolitan Region > Edmonton (0.04)
- Europe
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - Portugal > Porto
    - Porto (0.04)
- Asia > Middle East
  - Israel > Haifa District > Haifa (0.04)
- Africa > Ethiopia
  - Addis Ababa > Addis Ababa (0.04)

Genre:
- Research Report > New Finding (0.66)

Industry:
- Leisure & Entertainment > Games > Computer Games (0.54)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found