Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model

Oct-10-2024, 20:22:03 GMT–Neural Information Processing Systems

We investigate the sample efficiency of reinforcement learning in a \gamma -discounted infinite-horizon Markov decision process (MDP) with state space S and action space A, assuming access to a generative model. Despite a number of prior work tackling this problem, a complete picture of the trade-offs between sample complexity and statistical accuracy is yet to be determined. In particular, prior results suffer from a sample size barrier, in the sense that their claimed statistical guarantees hold only when the sample size exceeds at least S A / (1-\gamma) 2 (up to some log factor). The current paper overcomes this barrier by certifying the minimax optimality of model-based reinforcement learning as soon as the sample size exceeds the order of S A / (1-\gamma) (modulo some log factor). More specifically, a perturbed model-based planning algorithm provably finds an \epsilon -optimal policy with an order of S A / ((1-\gamma) 3\epsilon 2) samples (up to log factor) for any 0 \epsilon 1/(1-\gamma) .

generative model, model-based reinforcement learning, sample size barrier, (2 more...)

Neural Information Processing Systems

Oct-10-2024, 20:22:03 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Reinforcement Learning (0.88)
  - Natural Language > Generation (0.68)
  - Representation & Reasoning (1.00)