Uniform-PAC Guarantees for Model-Based RL with Bounded Eluder Dimension

May-15-2023–arXiv.org Artificial Intelligence

Designing efficient algorithms to learn and plan in the sequential decision-making environment modeled by a Markov decision process (MDP) is one of the main tasks in reinforcement learning (RL). However, traditional tabular RL algorithms suffer from the curse-of-dimensionality due to the large size of the state and action spaces in practice. To enable learning in high-dimensional state and action spaces, using a predefined function class to approximate the underlying transition dynamic or the value function is a common approach. Most existing works for RL with function approximation focus on simple linear function classes such as the linear mixture MDP (Modi et al., 2020; Ayoub et al., 2020; Zhou et al., 2021b), which can replace the size of the state and action spaces with the dimension of the linear function class. However, these assumptions are often too restrictive to hold in practice. Recently, a line of works (Russo and Van Roy, 2013; Du et al., 2021; Jin et al., 2021) emerged that studies RL with general function approximation, introducing new complexity measures for the general function class and proposing new algorithms with regret bounds or PAC guarantees in terms of the complexity of the general function class. All existing results of RL with a general function class are limited to either regret bounds or PAC sample complexity, both of which cannot ensure convergence to the optimal policy up to arbitrary accuracy.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

May-15-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - California > Los Angeles County > Los Angeles (0.28)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found