Sample-Efficient Reinforcement Learning for Linearly-Parameterized MDPs with a Generative Model
–Neural Information Processing Systems
The curse of dimensionality is a widely known issue in reinforcement learning (RL). In the tabular setting where the state space \mathcal{S} and the action space \mathcal{A} are both finite, to obtain a near optimal policy with sampling access to a generative model, the minimax optimal sample complexity scales linearly with \mathcal{S} \times \mathcal{A}, which can be prohibitively large when \mathcal{S} or \mathcal{A} is large. This paper considers a Markov decision process (MDP) that admits a set of state-action features, which can linearly express (or approximate) its probability transition kernel. We show that a model-based approach (resp. Q-function) with high probability as soon as the sample size exceeds the order of \frac{K}{(1-\gamma) {3}\varepsilon {2}} (resp.
Neural Information Processing Systems
Jan-19-2025, 01:03:08 GMT
- Technology: