Is Long Horizon RL More Difficult Than Short Horizon RL?

May-27-2025, 01:38:20 GMT–Neural Information Processing Systems

Learning to plan for long horizons is a central challenge in episodic reinforcement learning problems. A fundamental question is to understand how the difficulty of the problem scales as the horizon increases. Here the natural measure of sample complexity is a normalized one: we are interested in the \emph{number of episodes} it takes to provably discover a policy whose value is \varepsilon near to that of the optimal value, where the value is measured by the \emph{normalized} cumulative reward in each episode. In a COLT 2018 open problem, Jiang and Agarwal conjectured that, for tabular, episodic reinforcement learning problems, there exists a sample complexity lower bound which exhibits a polynomial dependence on the horizon --- a conjecture which is consistent with all known sample complexity upper bounds. This work refutes this conjecture, proving that tabular, episodic reinforcement learning is possible with a sample complexity that scales only \emph{logarithmically} with the planning horizon.

artificial intelligence, machine learning, reinforcement learning, (11 more...)

Neural Information Processing Systems

May-27-2025, 01:38:20 GMT

Conferences Web Page

Add feedback

Industry:
- Education > Focused Education > Special Education (0.51)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.75)