AITopics | ope method

STITCH-OPE: Trajectory Stitching with Guided Diffusion for Off-Policy Evaluation

Neural Information Processing SystemsJun-10-2026, 02:23:14 GMT

Off-policy evaluation (OPE) estimates the performance of a target policy using offline data collected from a behavior policy, and is crucial in domains such as robotics or healthcare where direct interaction with the environment is costly or unsafe. Existing OPE methods are ineffective for high-dimensional, long-horizon problems, due to exponential blow-ups in variance from importance weighting or compounding errors from learned dynamics models. To address these challenges, we propose STITCH-OPE, a model-based generative framework that leverages denoising diffusion for long-horizon OPE in high-dimensional state and action spaces. Starting with a diffusion model pre-trained on the behavior data, STITCH-OPE generates synthetic trajectories from the target policy by guiding the denoising process using the score function of the target policy. STITCH-OPE proposes two technical innovations that make it advantageous for OPE: (1) prevents over-regularization by subtracting the score of the behavior policy during guidance, and (2) generates long-horizon trajectories by stitching partial trajectories together end-to-end. We provide a theoretical guarantee that under mild assumptions, these modifications result in an exponential reduction in variance versus long-horizon trajectory diffusion. Experiments on the D4RL and OpenAI Gym benchmarks show substantial improvement in mean squared error, correlation, and regret metrics compared to state-of-the-art OPE methods.

artificial intelligence, machine learning, proceedings, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.76)

Add feedback

eb86863306d17c58d18315b5261a4520-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsFeb-18-2026, 14:19:57 GMT

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Cruz County > Santa Cruz (0.14)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
Asia > South Korea (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)
(2 more...)

Add feedback

5ee7ed60a7e8169012224dec5fe0d27f-Paper-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 08:17:19 GMT

algorithm, dataset, selection, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)
Asia > Middle East > Jordan (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.67)

Industry:

Health & Medicine (1.00)
Education (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

1cb57fcf7ff3f6d37eebae5becc9ea6d-Paper-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 14:55:27 GMT

machine learning, natural language, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > North Carolina > Wake County > Raleigh (0.04)
North America > United States > North Carolina > Durham County > Durham (0.04)
Europe > France (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.93)
Health & Medicine > Surgery (0.69)
Health & Medicine > Therapeutic Area > Oncology (0.67)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

Off-Policy Evaluation for Human Feedback

Neural Information Processing SystemsDec-24-2025, 03:09:05 GMT

Off-policy evaluation (OPE) is important for closing the gap between offline training and evaluation of reinforcement learning (RL), by estimating performance and/or rank of target (evaluation) policies using offline trajectories only. It can improve the safety and efficiency of data collection and policy testing procedures in situations where online deployments are expensive, such as healthcare. However, existing OPE methods fall short in estimating human feedback (HF) signals, as HF may be conditioned over multiple underlying factors and are only sparsely available; as opposed to the agent-defined environmental rewards (used in policy optimization), which are usually determined over parametric functions or distributions. Consequently, the nature of HF signals makes extrapolating accurate OPE estimations to be challenging. To resolve this, we introduce an OPE for HF (OPEHF) framework that revives existing OPE methods in order to accurately evaluate the HF signals. Specifically, we develop an immediate human reward (IHR) reconstruction approach, regularized by environmental knowledge distilled in a latent space that captures the underlying dynamics of state transitions as well as issuing HF signals. Our approach has been tested over, adaptive neurostimulation and intelligent tutoring, and a simulation environment (visual Q&A). Results show that our approach significantly improves the performance toward estimating HF signals accurately, compared to directly applying (variants of) existing OPE methods.

human feedback, name change, off-policy evaluation, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.79)

Add feedback

eb86863306d17c58d18315b5261a4520-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsOct-10-2025, 20:28:49 GMT

disease state, epicare, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Cruz County > Santa Cruz (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)
(2 more...)

Add feedback

5ee7ed60a7e8169012224dec5fe0d27f-Supplemental-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 15:50:31 GMT

algorithm, dataset, hyperparameter, (14 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area (0.72)
Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Data-Efficient Pipeline for Offline Reinforcement Learning with Limited Data

Neural Information Processing SystemsAug-15-2025, 05:18:56 GMT

While this is a common approach in supervised learning, to our knowledge, this has not been discussed in detail in the offline RL setting.

algorithm, dataset, selection, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)
Asia > Middle East > Jordan (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.67)

Industry:

Health & Medicine (1.00)
Education (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Off-Policy Evaluation for Human Feedback

Neural Information Processing SystemsOct-10-2024, 06:22:40 GMT

Off-policy evaluation (OPE) is important for closing the gap between offline training and evaluation of reinforcement learning (RL), by estimating performance and/or rank of target (evaluation) policies using offline trajectories only. It can improve the safety and efficiency of data collection and policy testing procedures in situations where online deployments are expensive, such as healthcare. However, existing OPE methods fall short in estimating human feedback (HF) signals, as HF may be conditioned over multiple underlying factors and are only sparsely available; as opposed to the agent-defined environmental rewards (used in policy optimization), which are usually determined over parametric functions or distributions. Consequently, the nature of HF signals makes extrapolating accurate OPE estimations to be challenging. To resolve this, we introduce an OPE for HF (OPEHF) framework that revives existing OPE methods in order to accurately evaluate the HF signals. Specifically, we develop an immediate human reward (IHR) reconstruction approach, regularized by environmental knowledge distilled in a latent space that captures the underlying dynamics of state transitions as well as issuing HF signals.

artificial intelligence, machine learning, off-policy evaluation, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Data Poisoning Attacks on Off-Policy Policy Evaluation Methods

Lobo, Elita, Singh, Harvineet, Petrik, Marek, Rudin, Cynthia, Lakkaraju, Himabindu

arXiv.org Artificial IntelligenceApr-6-2024

Off-policy Evaluation (OPE) methods are a crucial tool for evaluating policies in high-stakes domains such as healthcare, where exploration is often infeasible, unethical, or expensive. However, the extent to which such methods can be trusted under adversarial threats to data quality is largely unexplored. In this work, we make the first attempt at investigating the sensitivity of OPE methods to marginal adversarial perturbations to the data. We design a generic data poisoning attack framework leveraging influence functions from robust statistics to carefully construct perturbations that maximize error in the policy value estimates. We carry out extensive experimentation with multiple healthcare and control datasets. Our results demonstrate that many existing OPE methods are highly prone to generating value estimates with large errors when subject to data poisoning attacks, even for small adversarial perturbations. These findings question the reliability of policy values derived using OPE methods and motivate the need for developing OPE methods that are statistically robust to train-time data poisoning attacks.

dope attack, ope method, value function estimate, (15 more...)

arXiv.org Artificial Intelligence

2404.04714

Country: