Kernelized Reinforcement Learning with Order Optimal Regret Bounds

Oct-9-2024, 16:28:59 GMT–Neural Information Processing Systems

Modern reinforcement learning (RL) has shown empirical success in various real world settings with complex models and large state-action spaces. The existing analytical results, however, typically focus on settings with a small number of state-actions or simple models such as linearly modeled state-action value functions. To derive RL policies that efficiently handle large state-action spaces with more general value functions, some recent works have considered nonlinear function approximation using kernel ridge regression. We propose \pi -KRVI, an optimistic modification of least-squares value iteration, when the action-value function is represented by an RKHS. We prove the first order-optimal regret guarantees under a general setting.

kernelized reinforcement learning, order optimal regret bound, state-action space, (1 more...)

Neural Information Processing Systems

Oct-9-2024, 16:28:59 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)