Sample-Efficient Reinforcement Learning for Linearly-Parameterized MDPs with a Generative Model

Dec-24-2025, 20:53:48 GMT–Neural Information Processing Systems

The curse of dimensionality is a widely known issue in reinforcement learning (RL). In the tabular setting where the state space $\mathcal{S}$ and the action space $\mathcal{A}$ are both finite, to obtain a near optimal policy with sampling access to a generative model, the minimax optimal sample complexity scales linearly with $|\mathcal{S}|\times|\mathcal{A}|$, which can be prohibitively large when $\mathcal{S}$ or $\mathcal{A}$ is large. This paper considers a Markov decision process (MDP) that admits a set of state-action features, which can linearly express (or approximate) its probability transition kernel. We show that a model-based approach (resp.$~$Q-learning)

linearly-parameterized mdp, mathcal, sample-efficient reinforcement learning, (9 more...)

Neural Information Processing Systems

Dec-24-2025, 20:53:48 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning > Reinforcement Learning (0.45)