Uniform-PAC Guarantees for Model-Based RL with Bounded Eluder Dimension
Wu, Yue, He, Jiafan, Gu, Quanquan
–arXiv.org Artificial Intelligence
Designing efficient algorithms to learn and plan in the sequential decision-making environment modeled by a Markov decision process (MDP) is one of the main tasks in reinforcement learning (RL). However, traditional tabular RL algorithms suffer from the curse-of-dimensionality due to the large size of the state and action spaces in practice. To enable learning in high-dimensional state and action spaces, using a predefined function class to approximate the underlying transition dynamic or the value function is a common approach. Most existing works for RL with function approximation focus on simple linear function classes such as the linear mixture MDP (Modi et al., 2020; Ayoub et al., 2020; Zhou et al., 2021b), which can replace the size of the state and action spaces with the dimension of the linear function class. However, these assumptions are often too restrictive to hold in practice. Recently, a line of works (Russo and Van Roy, 2013; Du et al., 2021; Jin et al., 2021) emerged that studies RL with general function approximation, introducing new complexity measures for the general function class and proposing new algorithms with regret bounds or PAC guarantees in terms of the complexity of the general function class. All existing results of RL with a general function class are limited to either regret bounds or PAC sample complexity, both of which cannot ensure convergence to the optimal policy up to arbitrary accuracy.
arXiv.org Artificial Intelligence
May-15-2023
- Country:
- North America > United States
- California > Los Angeles County > Los Angeles (0.28)
- Asia > Middle East
- Jordan (0.04)
- North America > United States
- Genre:
- Research Report (0.50)