Kernelized Reinforcement Learning with Order Optimal Regret Bounds

Nov-28-2023–arXiv.org Machine Learning

Reinforcement learning (RL) has shown empirical success in various real world settings with complex models and large state-action spaces. The existing analytical results, however, typically focus on settings with a small number of state-actions or simple models such as linearly modeled state-action value functions. To derive RL policies that efficiently handle large state-action spaces with more general value functions, some recent works have considered nonlinear function approximation using kernel ridge regression. We propose $\pi$-KRVI, an optimistic modification of least-squares value iteration, when the state-action value function is represented by a reproducing kernel Hilbert space (RKHS). We prove the first order-optimal regret guarantees under a general setting. Our results show a significant polynomial in the number of episodes improvement over the state of the art. In particular, with highly non-smooth kernels (such as Neural Tangent kernel or some Mat\'ern kernels) the existing results lead to trivial (superlinear in the number of episodes) regret bounds. We show a sublinear regret bound that is order optimal in the case of Mat\'ern kernels where a lower bound on regret is known.

confidence interval, international conference, kernel, (12 more...)

arXiv.org Machine Learning

Nov-28-2023

arXiv.org PDF

Add feedback

Country:
- Asia > Middle East
  - Jordan (0.04)
- Europe
  - Netherlands
    - North Holland > Amsterdam (0.04)
    - South Holland > Delft (0.04)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
    - Oxfordshire > Oxford (0.04)
- North America
  - Canada > Alberta (0.14)
  - United States > Massachusetts
    - Middlesex County > Cambridge (0.04)

Genre:
- Research Report > New Finding (0.54)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found