AITopics | behavior-agnostic estimation

DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections

Neural Information Processing SystemsDec-25-2025, 23:57:18 GMT

In many real-world reinforcement learning applications, access to the environment is limited to a fixed dataset, instead of direct (online) interaction with the environment. When using this data for either evaluation or training of a new policy, accurate estimates of discounted stationary distribution ratios -- correction terms which quantify the likelihood that the new policy will experience a certain state-action pair normalized by the probability with which the state-action pair appears in the dataset -- can improve accuracy and performance. In this work, we propose an algorithm, DualDICE, for estimating these quantities. In contrast to previous approaches, our algorithm is agnostic to knowledge of the behavior policy (or policies) used to generate the dataset.

behavior-agnostic estimation, dualdice, stationary distribution correction, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

Reviews: DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections

Neural Information Processing SystemsNov-18-2025, 22:16:52 GMT

NeurIPS 2019 Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center "1361" "DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections" Reviewer 1 Originality: I find this work to be original and the proposed algorithm to be novel. The authors clearly state what they contributions are and how their work differs itself from the prior works. Clarity/Quality: The paper is clearly written and is easy to follow, the authors do a great job stating the problem they consider, explaining existing solutions and their drawbacks, and then thoroughly building up the intuition behind their approach. Each theoretical step makes sense and is intuitive. I also appreciate the authors taking time to deriving their method using a simple convex function and then demonstrating that it is possible to extend the method to more general set of functions.

artificial intelligence, objective function, stationary distribution correction, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections

Neural Information Processing SystemsOct-10-2024, 23:15:27 GMT

In many real-world reinforcement learning applications, access to the environment is limited to a fixed dataset, instead of direct (online) interaction with the environment. When using this data for either evaluation or training of a new policy, accurate estimates of discounted stationary distribution ratios -- correction terms which quantify the likelihood that the new policy will experience a certain state-action pair normalized by the probability with which the state-action pair appears in the dataset -- can improve accuracy and performance. In this work, we propose an algorithm, DualDICE, for estimating these quantities. In contrast to previous approaches, our algorithm is agnostic to knowledge of the behavior policy (or policies) used to generate the dataset. In addition to providing theoretical guarantees, we present an empirical study of our algorithm applied to off-policy policy evaluation and find that our algorithm significantly improves accuracy compared to existing techniques.

behavior-agnostic estimation, dualdice, stationary distribution correction, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.44)

Add feedback

DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections

Nachum, Ofir, Chow, Yinlam, Dai, Bo, Li, Lihong

Neural Information Processing SystemsMar-18-2020, 21:17:23 GMT

In many real-world reinforcement learning applications, access to the environment is limited to a fixed dataset, instead of direct (online) interaction with the environment. When using this data for either evaluation or training of a new policy, accurate estimates of discounted stationary distribution ratios -- correction terms which quantify the likelihood that the new policy will experience a certain state-action pair normalized by the probability with which the state-action pair appears in the dataset -- can improve accuracy and performance. In this work, we propose an algorithm, DualDICE, for estimating these quantities. In contrast to previous approaches, our algorithm is agnostic to knowledge of the behavior policy (or policies) used to generate the dataset. In addition to providing theoretical guarantees, we present an empirical study of our algorithm applied to off-policy policy evaluation and find that our algorithm significantly improves accuracy compared to existing techniques.

behavior-agnostic estimation, dualdice, stationary distribution correction, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.49)

Add feedback

Filters

Collaborating Authors

behavior-agnostic estimation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections

Reviews: DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections

DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections

DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections