Sufficient Exploration for Convex Q-learning

Lu, Fan, Mehta, Prashant, Meyn, Sean, Neu, Gergely

arXiv.org Artificial Intelligence 

Ever since the introduction of Watkins' Q-learning algorithm in the 1980s, the research community has searched for a general theory beyond the so-called tabular settings (in which the function class spans all possible functions of state and action). The natural extension of Q-learning to general function approximation setting seeks to solve what is known as a projected Bellman equation (PBE). There are few results available giving sufficient conditions for the existence of a solution, or convergence of the algorithm if a solution does exist [24, 17, 10]. Counterexamples show that conditions on the function class are required in general, even in a linear function approximation setting [1, 25, 6]. The GQ-algorithm of [14] is one success story, based on a relaxation of the PBE. Even if existence and stability of the algorithm were settled, we would still face the challenge of interpreting the output of a Q-learning algorithm based on the PBE criterion.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found