Regularized Q-Learning with Linear Function Approximation
Xi, Jiachen, Garcia, Alfredo, Momcilovic, Petar
–arXiv.org Artificial Intelligence
Several successful reinforcement learning algorithms make use of regularization to promote multi-modal policies that exhibit enhanced exploration and robustness. With functional approximation, the convergence properties of some of these algorithms (e.g. soft Q-learning) are not well understood. In this paper, we consider a single-loop algorithm for minimizing the projected Bellman error with finite time convergence guarantees in the case of linear function approximation. The algorithm operates on two scales: a slower scale for updating the target network of the state-action values, and a faster scale for approximating the Bellman backups in the subspace of the span of basis vectors. We show that, under certain assumptions, the proposed algorithm converges to a stationary point in the presence of Markovian noise. In addition, we provide a performance guarantee for the policies derived from the proposed algorithm.
arXiv.org Artificial Intelligence
Jan-26-2024
- Country:
- North America > United States
- Texas > Brazos County
- College Station (0.14)
- California > Alameda County
- Berkeley (0.04)
- Texas > Brazos County
- Asia > Middle East
- Jordan (0.04)
- North America > United States
- Genre:
- Research Report (0.82)
- Technology: