A Few Expert Queries Suffices for Sample-Efficient RL with Resets and Linear Value Approximation Philip Amortila Nan Jiang α Dean P. Foster
–Neural Information Processing Systems
The current paper studies sample-efficient Reinforcement Learning (RL) in settings where only the optimal value function is assumed to be linearly-realizable. It has recently been understood that, even under this seemingly strong assumption and access to a generative model, worst-case sample complexities can be prohibitively (i.e., exponentially) large.
Neural Information Processing Systems
Mar-27-2025, 15:06:37 GMT