Sample-Efficient Reinforcement Learning Is Feasible for Linearly Realizable MDPs with Limited Revisiting

Jan-15-2025, 05:29:04 GMT–Neural Information Processing Systems

Low-complexity models such as linear function representation play a pivotal role in enabling sample-efficient reinforcement learning (RL). The current paper pertains to a scenario with value-based linear representation, which postulates linear realizability of the optimal Q-function (also called the linear Q {\star} problem''). While linear realizability alone does not allow for sample-efficient solutions in general, the presence of a large sub-optimality gap is a potential game changer, depending on the sampling mechanism in use. Informally, sample efficiency is achievable with a large sub-optimality gap when a generative model is available, but is unfortunately infeasible when we turn to standard online RL settings. We make progress towards understanding this linear Q {\star} problem by investigating a new sampling protocol, which draws samples in an online/exploratory fashion but allows one to backtrack and revisit previous states.

linearly realizable mdp, sample-efficient reinforcement learning, sub-optimality gap, (7 more...)

Neural Information Processing Systems

Jan-15-2025, 05:29:04 GMT

Conferences Web Page

Add feedback

Genre:
- Research Report (0.60)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.64)