Assumed Density Filtering Q-learning

Jeong, Heejin, Zhang, Clark, Lee, Daniel D.

Jun-10-2018–arXiv.org Artificial Intelligence

While off-policy temporal difference (TD) methods have widely been used in reinforcement learning due to their efficiency and simple implementation, their Bayesian counterparts have not been utilized as frequently. One reason is that the non-linear max operation in the Bellman optimality equation makes it difficult to define conjugate distributions over the value functions. In this paper, we introduce a novel Bayesian approach to off-policy TD methods using Assumed Density Filtering (ADFQ), which updates beliefs on state-action values (Q) through an online Bayesian inference method. Uncertainty measures in the beliefs provide a natural regularization for learning, and we show how ADFQ reduces in a limiting case to the traditional Q-learning algorithm. Our empirical results demonstrate that the proposed ADFQ algorithms outperform comparable algorithms on several task domains. Moreover, our algorithms are computationally more efficient than other existing approaches to Bayesian reinforcement learning.

algorithm, artificial intelligence, bayesian inference, (19 more...)

arXiv.org Artificial Intelligence

Jun-10-2018

arXiv.org PDF

Add feedback

Country:
- North America > United States > Pennsylvania (0.46)

Genre:
- Research Report > New Finding (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Learning Graphical Models > Directed Networks
      - Bayesian Learning (1.00)
    - Reinforcement Learning (1.00)
  - Representation & Reasoning > Uncertainty
    - Bayesian Inference (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found