Sample-Efficient Reinforcement Learning for Linearly-Parameterized MDPs with a Generative Model

Jan-19-2025, 01:03:08 GMT–Neural Information Processing Systems

The curse of dimensionality is a widely known issue in reinforcement learning (RL). In the tabular setting where the state space \mathcal{S} and the action space \mathcal{A} are both finite, to obtain a near optimal policy with sampling access to a generative model, the minimax optimal sample complexity scales linearly with \mathcal{S} \times \mathcal{A}, which can be prohibitively large when \mathcal{S} or \mathcal{A} is large. This paper considers a Markov decision process (MDP) that admits a set of state-action features, which can linearly express (or approximate) its probability transition kernel. We show that a model-based approach (resp. Q-function) with high probability as soon as the sample size exceeds the order of \frac{K}{(1-\gamma) {3}\varepsilon {2}} (resp.

linearly-parameterized mdp, mathcal, sample-efficient reinforcement learning, (8 more...)

Neural Information Processing Systems

Jan-19-2025, 01:03:08 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Reinforcement Learning (0.91)
  - Natural Language > Generation (0.64)