Bellman Calibration for V-Learning in Offline Reinforcement Learning

Dec-30-2025–arXiv.org Machine Learning

We introduce Iterated Bellman Calibration, a simple, model-agnostic, post-hoc procedure for calibrating off-policy value predictions in infinite-horizon Markov decision processes. Bellman calibration requires that states with similar predicted long-term returns exhibit one-step returns consistent with the Bellman equation under the target policy. We adapt classical histogram and isotonic calibration to the dynamic, counterfactual setting by repeatedly regressing fitted Bellman targets onto a model's predictions, using a doubly robust pseudo-outcome to handle off-policy data. This yields a one-dimensional fitted value iteration scheme that can be applied to any value estimator. Our analysis provides finite-sample guarantees for both calibration and prediction under weak assumptions, and critically, without requiring Bellman completeness or realizability.

calibration, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

Dec-30-2025

arXiv.org PDF

Add feedback

Country:
- Europe > Germany
  - Hesse > Darmstadt Region > Darmstadt (0.04)
- North America > United States
  - California > San Mateo County
    - Menlo Park (0.04)
  - Massachusetts > Middlesex County
    - Cambridge (0.04)
  - Washington > King County
    - Seattle (0.04)

Genre:
- Research Report (0.40)

Industry:
- Information Technology > Services (0.40)
- Media
  - Film (0.40)
  - Television (0.40)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Reinforcement Learning (1.00)
  - Representation & Reasoning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found