Understanding the theoretical properties of projected Bellman equation, linear Q-learning, and approximate value iteration

Lim, Han-Dong, Lee, Donghwan

arXiv.org Artificial Intelligence 

Understanding the theoretical properties of projected Bellman equation, linear Q-learning, and approximate value iteration Han-Dong Lim limaries30@kaist.ac.kr Donghwan Lee donghwan@kaist.ac.kr Abstract In this paper, we study the theoretical properties of the projected Bellman equation (PBE) and two algorithms to solve this equation: linear Q-learning and approximate value iteration (A VI). We consider two sufficient conditions for the existence of a solution to PBE: strictly negatively row dominating diagonal (SNRDD) assumption and a condition motivated by the convergence of A VI. The SNRDD assumption also ensures the convergence of linear Q-learning, and its relationship with the convergence of A VI is examined. Lastly, several interesting observations on the solution of PBE are provided when using ϵ -greedy policy. 1 Introduction Reinforcement learning (RL) has achieved significant success, exemplified by the deep Q-network (DQN) (Mnih et al., 2015). This success can be largely ...

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found