When is Agnostic Reinforcement Learning Statistically Tractable?

Jia, Zeyu, Li, Gene, Rakhlin, Alexander, Sekhari, Ayush, Srebro, Nathan

Oct-9-2023–arXiv.org Machine Learning

We study the problem of agnostic PAC reinforcement learning (RL): given a policy class $\Pi$, how many rounds of interaction with an unknown MDP (with a potentially large state and action space) are required to learn an $\epsilon$-suboptimal policy with respect to $\Pi$? Towards that end, we introduce a new complexity measure, called the \emph{spanning capacity}, that depends solely on the set $\Pi$ and is independent of the MDP dynamics. With a generative model, we show that for any policy class $\Pi$, bounded spanning capacity characterizes PAC learnability. However, for online RL, the situation is more subtle. We show there exists a policy class $\Pi$ with a bounded spanning capacity that requires a superpolynomial number of samples to learn. This reveals a surprising separation for agnostic learnability between generative access and online access models (as well as between deterministic/stochastic MDPs under online access). On the positive side, we identify an additional \emph{sunflower} structure, which in conjunction with bounded spanning capacity enables statistically efficient online RL via a new algorithm called POPLER, which takes inspiration from classical importance sampling methods as well as techniques for reachable-state identification and policy evaluation in reward-free exploration.

machine learning, reinforcement learning, trajectory, (15 more...)

arXiv.org Machine Learning

Oct-9-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Washington > King County > Seattle (0.04)
- Europe > United Kingdom
  - England
    - Greater London > London (0.04)
    - Cambridgeshire > Cambridge (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found