AITopics | off-policy learning

Off-policy learning, referring to the procedure of policy optimization with access only to logged feedback data, has shown importance in various important real-world applications, such as search engines and recommender systems. While the ground-truth logging policy is usually unknown, previous work simply takes its estimated value for the off-policy learning, ignoring the negative impact from both high bias and high variance resulted from such an estimator. And these impact is often magnified on samples with small and inaccurately estimated logging probabilities. The contribution of this work is to explicitly model the uncertainty in the estimated logging policy, and propose an Uncertainty-aware Inverse Propensity Score estimator (UIPS) for improved off-policy learning, with a theoretical convergence guarantee. Experiment results on the synthetic and real-world recommendation datasets demonstrate that UIPS significantly improves the quality of the discovered policy, when compared against an extensive list of state-of-the-art baselines.

name change, off-policy learning, reweighting, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.91)

Add feedback

Provably Efficient Neural GTD for Off-Policy Learning

Neural Information Processing SystemsDec-24-2025, 04:56:09 GMT

This paper studies a gradient temporal difference (GTD) algorithm using neural network (NN) function approximators to minimize the mean squared Bellman error (MSBE). For off-policy learning, we show that the minimum MSBE problem can be recast into a min-max optimization involving a pair of over-parameterized primal-dual NNs. The resultant formulation can then be tackled using a neural GTD algorithm. We analyze the convergence of the proposed algorithm with a 2-layer ReLU NN architecture using $m$ neurons and prove that it computes an approximate optimal solution to the minimum MSBE problem as $m \rightarrow \infty$.

name change, off-policy learning, provably efficient neural gtd, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Reliable Off-Policy Learning for Dosage Combinations

Neural Information Processing SystemsNov-20-2025, 13:23:26 GMT

Existing work for this task has modeled the effect of multiple treatments independently, while estimating the joint effect has received little attention but comes with non-trivial challenges. In this paper, we propose a novel method for reliable off-policy learning for dosage combinations.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsOct-9-2025, 14:09:05 GMT

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This is a very well-written paper that explores the use of weighted importance sampling to speed up learning in off-policy LSTD-type algorithms. The theoretical results are solid and what one would expect. The computational results are striking. The technique could serve as a useful component in design of RL algorithms. Q2: Please summarize your review in 1-2 sentences The paper is very well-written and presents a useful idea validated by striking computational results.

algorithm, function approximation, value function approximation, (11 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.50)

Add feedback

Weighted importance sampling for off-policy learning with linear function approximation

A. Rupam Mahmood, Hado P. van Hasselt, Richard S. Sutton

Neural Information Processing SystemsOct-9-2025, 14:09:03 GMT

Importance sampling is an essential component of off-policy model-free reinforcement learning algorithms. However, its most effective variant, weighted importance sampling, does not carry over easily to function approximation and, because of this, it is not utilized in existing off-policy learning algorithms. In this paper, we take two steps toward bridging this gap. First, we show that weighted importance sampling can be viewed as a special case of weighting the error of individual training samples, and that this weighting has theoretical and empirical benefits similar to those of weighted importance sampling. Second, we show that these benefits extend to a new weighted-importance-sampling version of off-policy LSTD(). We show empirically that our new WIS-LSTD() algorithm can result in much more rapid and reliable convergence than conventional off-policy LSTD() (Y u 2010, Bertsekas & Y u 2009).

algorithm, function approximation, wis-lstd, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts (0.04)
North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
Asia > China > Beijing > Beijing (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.63)

Add feedback

Uncertainty-Aware Instance Reweighting for Off-Policy Learning

Neural Information Processing SystemsOct-9-2025, 10:39:12 GMT

IPS is the probability ratio of taking an action between the learning and logging policies.

artificial intelligence, dataset, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Asia > China > Chongqing Province > Chongqing (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Hong Kong (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Reliable Off-Policy Learning for Dosage Combinations

Neural Information Processing SystemsOct-9-2025, 08:40:51 GMT

Existing work for this task has modeled the effect of multiple treatments independently, while estimating the joint effect has received little attention but comes with non-trivial challenges. In this paper, we propose a novel method for reliable off-policy learning for dosage combinations.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country: