A Few Expert Queries Suffices for Sample-Efficient RL with Resets and Linear Value Approximation
–Neural Information Processing Systems
The current paper studies sample-efficient Reinforcement Learning (RL) in settings where only the optimal value function is assumed to be linearly-realizable. It has recently been understood that, even under this seemingly strong assumption and access to a generative model, worst-case sample complexities can be prohibitively (i.e., exponentially) large. We investigate the setting where the learner additionally has access to interactive demonstrations from an expert policy, and we present a statistically and computationally efficient algorithm (Delphi) for blending exploration with expert queries. In particular, Delphi requires \tilde O(d) expert queries and a \texttt{poly}(d,H, A,1/\varepsilon) amount of exploratory samples to provably recover an \varepsilon -suboptimal policy. Compared to pure RL approaches, this corresponds to an exponential improvement in sample complexity with surprisingly-little expert input.
Neural Information Processing Systems
Jan-18-2025, 18:23:36 GMT
- Technology: