The Benefits of Being Distributional: Small-Loss Bounds for Reinforcement Learning

May-1-2026, 01:33:10 GMT–Neural Information Processing Systems

While distributional reinforcement learning (DistRL) has been empirically effective, the question of when and why it is better than vanilla, non-distributional RL has remained unanswered. This paper explains the benefits of DistRL through the lens of small-loss bounds, which are instance-dependent bounds that scale with optimal achievable cost. Particularly, our bounds converge much faster than those from non-distributional approaches if the optimal cost is small. As warmup, we propose a distributional contextual bandit (DistCB) algorithm, which we show enjoys small-loss regret bounds and empirically outperforms the state-of-the-art on three real-world tasks. In online RL, we propose a DistRL algorithm that constructs confidence sets using maximum likelihood estimation. We prove that our algorithm enjoys novel small-loss PAC bounds in low-rank MDPs. As part of our analysis, we introduce the ℓ1 distributional eluder dimension which may be of independent interest. Then, in offline RL, we show that pessimistic DistRL enjoys small-loss PAC bounds that are novel to the offline setting and are more robust to bad single-policy coverage.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

May-1-2026, 01:33:10 GMT

Conferences PDF

Add feedback

Genre:
- Research Report (0.66)
- Workflow (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Uncertainty
    - Bayesian Inference (0.54)
  - Machine Learning
    - Reinforcement Learning (1.00)
    - Learning Graphical Models > Directed Networks
      - Bayesian Learning (0.54)

Duplicate Docs Excel Report

Title
06fc38f5c21ae66ef955e28b7a78ece5-Paper-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found