Conformal Prediction Beyond the Horizon: Distribution-Free Inference for Policy Evaluation

Neural Information Processing Systems 

Reliable uncertainty quantification is crucial for reinforcement learning (RL) in high-stakes settings. We propose a unified conformal prediction framework for infinite-horizon policy evaluation that constructs distribution-free prediction intervals for returns in both on-policy and off-policy settings.