AnExponentialLowerBoundforLinearly-Realizable MDPswithConstantSuboptimalityGap
–Neural Information Processing Systems
A fundamental question in the theory of reinforcement learning is: suppose the optimalQ-function lies inthe linear span ofagivenddimensional feature mapping, is sample-efficient reinforcement learning (RL) possible? The recent and remarkable result of Weisz et al. (2020) resolves this question in the negative, providinganexponential(ind)samplesizelowerbound,whichholdsevenifthe agent has access to a generative model of the environment. One may hope that such a lower can be circumvented with an even stronger assumption that there isaconstant gapbetween the optimalQ-value ofthe best action and that ofthe second-best action (for allstates); indeed, the construction inWeisz etal.
Neural Information Processing Systems
Feb-8-2026, 15:06:49 GMT
- Country:
- Asia > Middle East
- Jordan (0.04)
- North America > United States
- Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Asia > Middle East
- Genre:
- Research Report (0.68)
- Technology: