Assumed Density Filtering Q-learning
Jeong, Heejin, Zhang, Clark, Lee, Daniel D.
–arXiv.org Artificial Intelligence
While off-policy temporal difference (TD) methods have widely been used in reinforcement learning due to their efficiency and simple implementation, their Bayesian counterparts have not been utilized as frequently. One reason is that the non-linear max operation in the Bellman optimality equation makes it difficult to define conjugate distributions over the value functions. In this paper, we introduce a novel Bayesian approach to off-policy TD methods using Assumed Density Filtering (ADFQ), which updates beliefs on state-action values (Q) through an online Bayesian inference method. Uncertainty measures in the beliefs provide a natural regularization for learning, and we show how ADFQ reduces in a limiting case to the traditional Q-learning algorithm. Our empirical results demonstrate that the proposed ADFQ algorithms outperform comparable algorithms on several task domains. Moreover, our algorithms are computationally more efficient than other existing approaches to Bayesian reinforcement learning.
arXiv.org Artificial Intelligence
Jun-10-2018
- Country:
- North America > United States > Pennsylvania (0.46)
- Genre:
- Research Report > New Finding (0.34)