Sufficient Exploration for Convex Q-learning
Lu, Fan, Mehta, Prashant, Meyn, Sean, Neu, Gergely
–arXiv.org Artificial Intelligence
Ever since the introduction of Watkins' Q-learning algorithm in the 1980s, the research community has searched for a general theory beyond the so-called tabular settings (in which the function class spans all possible functions of state and action). The natural extension of Q-learning to general function approximation setting seeks to solve what is known as a projected Bellman equation (PBE). There are few results available giving sufficient conditions for the existence of a solution, or convergence of the algorithm if a solution does exist [24, 17, 10]. Counterexamples show that conditions on the function class are required in general, even in a linear function approximation setting [1, 25, 6]. The GQ-algorithm of [14] is one success story, based on a relaxation of the PBE. Even if existence and stability of the algorithm were settled, we would still face the challenge of interpreting the output of a Q-learning algorithm based on the PBE criterion.
arXiv.org Artificial Intelligence
Oct-17-2022
- Country:
- North America > United States
- Illinois (0.04)
- New York > New York County
- New York City (0.04)
- Massachusetts
- Middlesex County > Cambridge (0.14)
- Suffolk County > Boston (0.04)
- Plymouth County > Norwell (0.04)
- Florida > Alachua County
- Gainesville (0.14)
- California > San Francisco County
- San Francisco (0.14)
- Europe
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- France > Île-de-France
- United Kingdom > England
- North America > United States
- Genre:
- Research Report (0.50)
- Technology: