On the hardness of RL with Lookahead
Pla, Corentin, Richard, Hugo, Abeille, Marc, Merlis, Nadav, Perchet, Vianney
We study reinforcement learning (RL) with transition look-ahead, where the agent may observe which states would be visited upon playing any sequence of $\ell$ actions before deciding its course of action. While such predictive information can drastically improve the achievable performance, we show that using this information optimally comes at a potentially prohibitive computational cost. Specifically, we prove that optimal planning with one-step look-ahead ($\ell=1$) can be solved in polynomial time through a novel linear programming formulation. In contrast, for $\ell \geq 2$, the problem becomes NP-hard. Our results delineate a precise boundary between tractable and intractable cases for the problem of planning with transition look-ahead in reinforcement learning.
Oct-23-2025
- Country:
- North America > United States
- New York (0.04)
- New Jersey > Hudson County
- Hoboken (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- Europe > France
- Île-de-France > Paris > Paris (0.04)
- Asia > Middle East
- Jordan (0.04)
- Israel > Haifa District
- Haifa (0.04)
- North America > United States
- Genre:
- Research Report > New Finding (0.66)