Outcome-Based Online Reinforcement Learning: Algorithms and Fundamental Limits
–Neural Information Processing Systems
Reinforcement learning with outcome-based feedback faces a fundamental challenge: when rewards are only observed at trajectory endpoints, how do we assign credit to the right actions? This paper provides the first comprehensive analysis of this problem in online RL with general function approximation.
Neural Information Processing Systems
Jun-13-2026, 10:33:11 GMT
- Technology: