AITopics | policy shift

Collaborating Authors

policy shift

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Marginal Density Ratio for Off-Policy Evaluation in Contextual Bandits Muhammad Faaiz Taufiq

Neural Information Processing SystemsFeb-16-2026, 07:50:49 GMT

Propensity Score (MIPS) estimator, proving that MR achieves lower variance among a generalized family of MIPS estimators.

artificial intelligence, data mining, machine learning, (20 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Data Science > Data Mining (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

cc84bfabe6389d8883fc2071c848f62a-Paper-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 23:06:45 GMT

copp, policy shift, target policy, (12 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Asia > Singapore (0.04)

Genre: Research Report > Experimental Study (0.46)

Technology:

Information Technology > Data Science (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

EU proposal to delay parts of its AI Act signal a policy shift that prioritises big tech over fairness

AIHubNov-27-2025, 11:20:41 GMT

The roll-out of the European Union's Artificial Intelligence Act has hit a critical turning point. The act establishes rules for how AI systems can be used within the European Union. It officially entered into force on August 1 2024, although different rules come into effect at different times. The European Commission has now proposed delaying parts of the act until 2027. This follows intense pressure from tech companies and from the Trump administration.

ai system, artificial intelligence, social media, (15 more...)

AIHub

Country:

North America > United States > North Carolina (0.06)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.06)

Industry:

Information Technology (1.00)
Government > Regional Government > Europe Government (1.00)
Government > Regional Government > North America Government > United States Government (0.91)

Technology:

Information Technology > Communications > Social Media (0.72)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.72)

Add feedback

Marginal Density Ratio for Off-Policy Evaluation in Contextual Bandits Muhammad Faaiz Taufiq

Neural Information Processing SystemsOct-9-2025, 03:43:07 GMT

Propensity Score (MIPS) estimator, proving that MR achieves lower variance among a generalized family of MIPS estimators.

artificial intelligence, data mining, machine learning, (20 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Data Science > Data Mining (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

A Proofs

Neural Information Processing SystemsAug-18-2025, 23:51:22 GMT

In this proof, we use the notion of weighted exchangeability as defined in Section 3.2 of [27]. A.2 Proof of Proposition 4.2 The following proof is an adaptation of [14, Proposition 1] to our setting. To get from (32) to (33), we use Assumption 2 and Markov's inequality. B.1 Further comments on the differences between [14] and COPP In this subsection, we elaborate on the differences between our work and [14]. As mentioned in in the main text, given that we are integrating out the action in Eq. 7, we are essentially able to use the full dataset when constructing the CP intervals.

artificial intelligence, machine learning, target policy, (19 more...)

Neural Information Processing Systems

Technology:

Information Technology > Data Science (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

cc84bfabe6389d8883fc2071c848f62a-Paper-Conference.pdf

Neural Information Processing SystemsAug-18-2025, 23:51:18 GMT

artificial intelligence, machine learning, target policy, (14 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Asia > Singapore (0.04)

Genre: Research Report > Experimental Study (0.46)

Technology:

Information Technology > Data Science (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

SAMBO-RL: Shifts-aware Model-based Offline Reinforcement Learning

Luo, Wang, Li, Haoran, Zhang, Zicheng, Han, Congying, Lv, Jiayu, Guo, Tiande

arXiv.org Machine LearningAug-23-2024

Model-based Offline Reinforcement Learning trains policies based on offline datasets and model dynamics, without direct real-world environment interactions. However, this method is inherently challenged by distribution shift. Previous approaches have primarily focused on tackling this issue directly leveraging off-policy mechanisms and heuristic uncertainty in model dynamics, but they resulted in inconsistent objectives and lacked a unified theoretical foundation. This paper offers a comprehensive analysis that disentangles the problem into two key components: model bias and policy shift. We provide both theoretical insights and empirical evidence to demonstrate how these factors lead to inaccuracies in value function estimation and impose implicit restrictions on policy learning. To address these challenges, we derive adjustment terms for model bias and policy shift within a unified probabilistic inference framework. These adjustments are seamlessly integrated into the vanilla reward function to create a novel Shifts-aware Reward (SAR), aiming at refining value learning and facilitating policy training. Furthermore, we introduce Shifts-aware Model-based Offline Reinforcement Learning (SAMBO-RL), a practical framework that efficiently trains classifiers to approximate the SAR for policy optimization. Empirically, we show that SAR effectively mitigates distribution shift, and SAMBO-RL demonstrates superior performance across various benchmarks, underscoring its practical effectiveness and validating our theoretical analysis.

model bias, policy shift, shift-aware reward, (13 more...)

arXiv.org Machine Learning

2408.1283

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

OMPO: A Unified Framework for RL under Policy and Dynamics Shifts

Luo, Yu, Ji, Tianying, Sun, Fuchun, Zhang, Jianwei, Xu, Huazhe, Zhan, Xianyuan

arXiv.org Artificial IntelligenceMay-29-2024

Training reinforcement learning policies using environment interaction data collected from varying policies or dynamics presents a fundamental challenge. Existing works often overlook the distribution discrepancies induced by policy or dynamics shifts, or rely on specialized algorithms with task priors, thus often resulting in suboptimal policy performances and high learning variances. In this paper, we identify a unified strategy for online RL policy learning under diverse settings of policy and dynamics shifts: transition occupancy matching. In light of this, we introduce a surrogate policy learning objective by considering the transition occupancy discrepancies and then cast it into a tractable min-max optimization problem through dual reformulation. Our method, dubbed Occupancy-Matching Policy Optimization (OMPO), features a specialized actor-critic structure equipped with a distribution discriminator and a small-size local buffer. We conduct extensive experiments based on the OpenAI Gym, Meta-World, and Panda Robots environments, encompassing policy shifts under stationary and nonstationary dynamics, as well as domain adaption. The results demonstrate that OMPO outperforms the specialized baselines from different categories in all settings. We also find that OMPO exhibits particularly strong performance when combined with domain randomization, highlighting its potential in RL-based robotics applications

dynamic shift, objective, ompo, (13 more...)

arXiv.org Artificial Intelligence

2405.1908

Country:

Europe > Austria > Vienna (0.14)
Asia > China > Shanghai > Shanghai (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Marginal Density Ratio for Off-Policy Evaluation in Contextual Bandits

Taufiq, Muhammad Faaiz, Doucet, Arnaud, Cornish, Rob, Ton, Jean-Francois

arXiv.org Machine LearningDec-3-2023

Off-Policy Evaluation (OPE) in contextual bandits is crucial for assessing new policies using existing data without costly experimentation. However, current OPE methods, such as Inverse Probability Weighting (IPW) and Doubly Robust (DR) estimators, suffer from high variance, particularly in cases of low overlap between target and behavior policies or large action and context spaces. In this paper, we introduce a new OPE estimator for contextual bandits, the Marginal Ratio (MR) estimator, which focuses on the shift in the marginal distribution of outcomes $Y$ instead of the policies themselves. Through rigorous theoretical analysis, we demonstrate the benefits of the MR estimator compared to conventional methods like IPW and DR in terms of variance reduction. Additionally, we establish a connection between the MR estimator and the state-of-the-art Marginalized Inverse Propensity Score (MIPS) estimator, proving that MR achieves lower variance among a generalized family of MIPS estimators. We further illustrate the utility of the MR estimator in causal inference settings, where it exhibits enhanced performance in estimating Average Treatment Effects (ATE). Our experiments on synthetic and real-world datasets corroborate our theoretical findings and highlight the practical advantages of the MR estimator in OPE for contextual bandits.

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Machine Learning

2312.01457

Country:

North America > United States > New York > New York County > New York City (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Data Science > Data Mining (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Conformal Off-Policy Prediction in Contextual Bandits

Taufiq, Muhammad Faaiz, Ton, Jean-Francois, Cornish, Rob, Teh, Yee Whye, Doucet, Arnaud

arXiv.org Artificial IntelligenceOct-26-2022

Most off-policy evaluation methods for contextual bandits have focused on the expected outcome of a policy, which is estimated via methods that at best provide only asymptotic guarantees. However, in many applications, the expectation may not be the best measure of performance as it does not capture the variability of the outcome. In addition, particularly in safety-critical settings, stronger guarantees than asymptotic correctness may be required. To address these limitations, we consider a novel application of conformal prediction to contextual bandits. Given data collected under a behavioral policy, we propose \emph{conformal off-policy prediction} (COPP), which can output reliable predictive intervals for the outcome under a new target policy. We provide theoretical finite-sample guarantees without making any additional assumptions beyond the standard contextual bandit setup, and empirically demonstrate the utility of COPP compared with existing methods on synthetic and real-world data.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2206.04405

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Asia > Singapore (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Data Science > Data Mining (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.45)

Add feedback