A Few Expert Queries Suffices for Sample-Efficient RL with Resets and Linear Value Approximation Philip Amortila Nan Jiang α Dean P. Foster

Neural Information Processing Systems 

The current paper studies sample-efficient Reinforcement Learning (RL) in settings where only the optimal value function is assumed to be linearly-realizable. It has recently been understood that, even under this seemingly strong assumption and access to a generative model, worst-case sample complexities can be prohibitively (i.e., exponentially) large.