Outcome-Based Online Reinforcement Learning: Algorithms and Fundamental Limits

Jun-13-2026, 10:33:11 GMT–Neural Information Processing Systems

Reinforcement learning with outcome-based feedback faces a fundamental challenge: when rewards are only observed at trajectory endpoints, how do we assign credit to the right actions? This paper provides the first comprehensive analysis of this problem in online RL with general function approximation.

artificial intelligence, machine learning, reinforcement learning, (7 more...)

Neural Information Processing Systems

Jun-13-2026, 10:33:11 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.46)