When is Agnostic Reinforcement Learning Statistically Tractable?

Jan-18-2025, 12:28:14 GMT–Neural Information Processing Systems

We study the problem of agnostic PAC reinforcement learning (RL): given a policy class \Pi, how many rounds of interaction with an unknown MDP (with a potentially large state and action space) are required to learn an \epsilon -suboptimal policy with respect to \(\Pi\)? Towards that end, we introduce a new complexity measure, called the \emph{spanning capacity}, that depends solely on the set \(\Pi\) and is independent of the MDP dynamics. With a generative model, we show that the spanning capacity characterizes PAC learnability for every policy class \Pi . However, for online RL, the situation is more subtle. We show there exists a policy class \Pi with a bounded spanning capacity that requires a superpolynomial number of samples to learn.

agnostic reinforcement learning statistically tractable, learnability, policy class pi, (2 more...)

Neural Information Processing Systems

Jan-18-2025, 12:28:14 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.64)