STITCH-OPE: Trajectory Stitching with Guided Diffusion for Off-Policy Evaluation

Jun-14-2026, 14:13:19 GMT–Neural Information Processing Systems

Off-policy evaluation (OPE) estimates the performance of a target policy using offline data collected from a behavior policy, and is crucial in domains such as robotics or healthcare where direct interaction with the environment is costly or unsafe. Existing OPE methods are ineffective for high-dimensional, long-horizon problems, due to exponential blow-ups in variance from importance weighting or compounding errors from learned dynamics models. To address these challenges, we propose STITCH-OPE, a model-based generative framework that leverages denoising diffusion for long-horizon OPE in high-dimensional state and action spaces. Starting with a diffusion model pre-trained on the behavior data, STITCHOPE generates synthetic trajectories from the target policy by guiding the denoising process using the score function of the target policy. STITCH-OPE proposes two technical innovations that make it advantageous for OPE: (1) prevents overregularization by subtracting the score of the behavior policy during guidance, and (2) generates long-horizon trajectories by stitching partial trajectories together end-to-end. We provide a theoretical guarantee that under mild assumptions, these modifications result in an exponential reduction in variance versus long-horizon trajectory diffusion.

artificial intelligence, machine learning, target policy, (16 more...)

Neural Information Processing Systems

Jun-14-2026, 14:13:19 GMT

Conferences PDF

Add feedback

Country:
- North America > Canada (0.28)

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (0.92)

Industry:
- Health & Medicine (0.48)

Technology:
- Information Technology > Artificial Intelligence
  - Robots (0.87)
  - Representation & Reasoning
    - Uncertainty (0.92)
    - Agents (0.67)
  - Machine Learning
    - Neural Networks (0.93)
    - Statistical Learning (0.92)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found