A Few Expert Queries Suffices for Sample-Efficient RL with Resets and Linear Value Approximation

Jan-18-2025, 18:23:36 GMT–Neural Information Processing Systems

The current paper studies sample-efficient Reinforcement Learning (RL) in settings where only the optimal value function is assumed to be linearly-realizable. It has recently been understood that, even under this seemingly strong assumption and access to a generative model, worst-case sample complexities can be prohibitively (i.e., exponentially) large. We investigate the setting where the learner additionally has access to interactive demonstrations from an expert policy, and we present a statistically and computationally efficient algorithm (Delphi) for blending exploration with expert queries. In particular, Delphi requires \tilde O(d) expert queries and a \texttt{poly}(d,H, A,1/\varepsilon) amount of exploratory samples to provably recover an \varepsilon -suboptimal policy. Compared to pure RL approaches, this corresponds to an exponential improvement in sample complexity with surprisingly-little expert input.

expert query suffice, reset and linear value approximation, sample-efficient rl, (6 more...)

Neural Information Processing Systems

Jan-18-2025, 18:23:36 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.58)