AITopics | best policy

Collaborating Authors

best policy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Logarithmic Smoothing for Pessimistic Off-Policy Evaluation, Selection and Learning

Neural Information Processing SystemsFeb-16-2026, 17:11:05 GMT

In online advertising, for instance, the user's profile is the context, the

artificial intelligence, data mining, machine learning, (19 more...)

Neural Information Processing Systems

Country:

Europe > France > Île-de-France > Paris > Paris (0.04)
South America > Paraguay > Asunción > Asunción (0.04)
North America > United States > Washington > King County > Bellevue (0.04)
(5 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.92)

Industry:

Marketing (0.34)
Information Technology > Services (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)

Add feedback

Navigating to the Best Policy in Markov Decision Processes

Neural Information Processing SystemsDec-25-2025, 00:18:31 GMT

We investigate the classical active pure exploration problem in Markov Decision Processes, where the agent sequentially selects actions and, from the resulting system trajectory, aims at identifying the best policy as fast as possible. We propose a problem-dependent lower bound on the average number of steps required before a correct answer can be given with probability at least $1-\delta$. We further provide the first algorithm with an instance-specific sample complexity in this setting. This algorithm addresses the general case of communicating MDPs; we also propose a variant with a reduced exploration rate (and hence faster convergence) under an additional ergodicity assumption. This work extends previous results relative to the \emph{generative setting}~\cite{pmlr-v139-marjani21a}, where the agent could at each step query the random outcome of any (state, action) pair. In contrast, we show here how to deal with the \emph{navigation constraints}, induced by the \emph{online setting}. Our analysis relies on an ergodic theorem for non-homogeneous Markov chains which we consider of wide interest in the analysis of Markov Decision Processes.

best policy, markov decision process, name change, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.87)

Add feedback

Honesty Is the Best Policy: Defining and Mitigating AI Deception

Neural Information Processing SystemsDec-23-2025, 19:22:03 GMT

Deceptive agents are a challenge for the safety, trustworthiness, and cooperation of AI systems. We focus on the problem that agents might deceive in order to achieve their goals (for instance, in our experiments with language models, the goal of being evaluated as truthful).There are a number of existing definitions of deception in the literature on game theory and symbolic AI, but there is no overarching theory of deception for learning agents in games. We introduce a formaldefinition of deception in structural causal games, grounded in the philosophyliterature, and applicable to real-world machine learning systems.Several examples and results illustrate that our formal definition aligns with the philosophical and commonsense meaning of deception.Our main technical result is to provide graphical criteria for deception. We show, experimentally, that these results can be used to mitigate deception in reinforcement learning agents and language models.

best policy, deception, name change, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.61)

Add feedback

NFQ2.0: The CartPole Benchmark Revisited

Lange, Sascha, Hafner, Roland, Riedmiller, Martin

arXiv.org Artificial IntelligenceNov-18-2025

This article revisits the 20-year-old neural fitted Q-iteration (NFQ) algorithm on its classical CartPole benchmark. NFQ was a pioneering approach towards modern Deep Reinforcement Learning (Deep RL) in applying multi-layer neural networks to reinforcement learning for real-world control problems. We explore the algorithm's conceptual simplicity and its transition from online to batch learning, which contributed to its stability. Despite its initial success, NFQ required extensive tuning and was not easily reproducible on real-world control problems. We propose a modernized variant NFQ2.0 and apply it to the CartPole task, concentrating on a real-world system build from standard industrial components, to investigate and improve the learning process's repeatability and robustness. Through ablation studies, we highlight key design decisions and hyperparameters that enhance performance and stability of NFQ2.0 over the original variant. Finally, we demonstrate how our findings can assist practitioners in reproducing and improving results and applying deep reinforcement learning more effectively in industrial contexts.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2511.12644

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Germany > Baden-Württemberg > Freiburg (0.04)

Genre: Research Report > New Finding (0.65)

Industry:

Leisure & Entertainment (0.46)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

9379ea6ba7a61a402c7750833848b99f-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 09:59:14 GMT

estimator, experiment, probability, (17 more...)

Neural Information Processing Systems

Country:

Europe > France > Île-de-France > Paris > Paris (0.04)
South America > Paraguay > Asunción > Asunción (0.04)
North America > United States > Washington > King County > Bellevue (0.04)
(5 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.92)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)

Add feedback

3c56fe2f24038c4d22b9eb0aca78f590-Supplemental.pdf

Neural Information Processing SystemsOct-2-2025, 17:37:22 GMT

artificial intelligence, machine learning, oracle, (17 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Restless-UCB, an Efficient and Low-complexity Algorithm for Online Restless Bandits

Neural Information Processing SystemsAug-15-2025, 01:02:04 GMT

As a result, our analysis technique can also be adopted to tighten the regret bounds of existing algorithms.

bandit problem, restless bandit problem, restless-ucb, (15 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.93)

Industry:

Education (0.46)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Communications (0.93)
Information Technology > Data Science > Data Mining > Big Data (0.32)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.32)

Add feedback

Clustered KL-barycenter design for policy evaluation

Weissmann, Simon, Freihaut, Till, Vernade, Claire, Ramponi, Giorgia, Döring, Leif

arXiv.org Artificial IntelligenceMar-4-2025

In the context of stochastic bandit models, this article examines how to design sample-efficient behavior policies for the importance sampling evaluation of multiple target policies. From importance sampling theory, it is well established that sample efficiency is highly sensitive to the KL divergence between the target and importance sampling distributions. We first analyze a single behavior policy defined as the KL-barycenter of the target policies. Then, we refine this approach by clustering the target policies into groups with small KL divergences and assigning each cluster its own KL-barycenter as a behavior policy. This clustered KL-based policy evaluation (CKL-PE) algorithm provides a novel perspective on optimal policy selection. We prove upper bounds on the sample complexity of our method and demonstrate its effectiveness with numerical validation.

behavior policy, evaluation, target policy, (13 more...)

arXiv.org Artificial Intelligence

2503.02735

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
North America > United States (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Navigating to the Best Policy in Markov Decision Processes

Neural Information Processing SystemsJan-19-2025, 09:02:18 GMT

We investigate the classical active pure exploration problem in Markov Decision Processes, where the agent sequentially selects actions and, from the resulting system trajectory, aims at identifying the best policy as fast as possible. We propose a problem-dependent lower bound on the average number of steps required before a correct answer can be given with probability at least 1-\delta . We further provide the first algorithm with an instance-specific sample complexity in this setting. This algorithm addresses the general case of communicating MDPs; we also propose a variant with a reduced exploration rate (and hence faster convergence) under an additional ergodicity assumption. This work extends previous results relative to the \emph{generative setting} \cite{pmlr-v139-marjani21a}, where the agent could at each step query the random outcome of any (state, action) pair.

best policy, markov decision process, navigating, (2 more...)

Neural Information Processing Systems

Technology: