AITopics | bandit problem

Collaborating Authors

bandit problem

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Multimodal Bandits: Regret Lower Bounds and Optimal Algorithms

Neural Information Processing SystemsJun-19-2026, 14:46:10 GMT

We consider a stochastic multi-armed bandit problem with i.i.d.

data mining, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country: Europe (0.45)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)
Information Technology > Artificial Intelligence > Natural Language (0.69)

Add feedback

Pareto Optimal Risk Measure Agnostic Distributional Bandits with Heavy-Tail Rewards

Neural Information Processing SystemsJun-19-2026, 14:32:19 GMT

This paper addresses the problem of multi-risk measure agnostic multi-armed bandits in heavy-tailed reward settings. We propose a framework that leverages novel deviation inequalities for the 1-Wasserstein distance to construct confidence intervals for Lipschitz risk measures. The distributional LCB (DistLCB) algorithm is introduced, which achieves asymptotic optimality by deriving the first lower bounds for risk measure aware bandits with explicit sub-optimality gap dependencies. The DistLCB is further extended to multi-risk objectives, which enables Pareto-optimal solutions that consider multiple aspects of reward distributions. Additionally, we provide a regret analysis that includes both gap-dependent and gap-independent bounds for multi-risk settings. Experiments validate the effectiveness of the proposed methods in synthetic and real-world applications.

artificial intelligence, data mining, machine learning, (23 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
North America > United States (0.93)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Industry: Banking & Finance > Trading (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Game Theory (0.85)
Information Technology > Data Science > Data Mining > Big Data (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

Add feedback

Adaptive Learning Rates with Surrogate Probability for Follow-the-Perturbed-Leader

Lee, Jongyeong, Honda, Junya, Ito, Shinji, Kim, Chansoo

arXiv.org Machine LearningJun-5-2026

Follow-the-regularized-leader framework has shown effectiveness and flexibility in online learning problems, where the choice of learning rates are known to be crucial. Recently, adaptive learning rates defined in terms of the arm-selection probabilities, obtained by solving convex optimization, have achieved improved best-of-both-worlds (BOBW) guarantees in various bandit problems. In contrast, BOBW guarantees for its computationally efficient alternative, follow-the-perturbed-leader (FTPL), remain relatively limited since its optimization-free nature ironically makes the design of adaptive, probability-dependent learning rates non-trivial. To address this challenge, we propose an adaptive learning rate for FTPL by introducing surrogate probability functions that can be computed only from the available quantities, without requiring the exact probabilities. Based on these learning rates with surrogate functions, we provide the BOBW guarantee for FTPL with Pareto perturbations for any shape parameter $α>1$, generalizing prior results restricted to specific choices of $α=2$. We further show the BOBW guarantees for FTPL with adaptive learning rates in the bandit problem with expert advices. Our approach preserves the computational simplicity of FTPL while enabling probability-dependent adaptivity, and the surrogate-based methodology may be of independent interest in other algorithmic frameworks beyond FTPL and learning rate designs.

bobw guarantee, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

2606.06043

Country: Europe (0.28)

Genre: Research Report (0.64)

Industry: Education (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.87)

Add feedback

Double Thompson Sampling for Dueling Bandits

Huasen Wu, Xin Liu

Neural Information Processing SystemsMay-1-2026, 06:05:22 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, data mining, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Europe (0.28)
North America > United States > California (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.49)

Add feedback

The Power of Adaptivity in Identifying Statistical Alternatives

Kevin G. Jamieson, Daniel Haas, Benjamin Recht

Neural Information Processing SystemsMay-1-2026, 05:47:07 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.68)

Industry: Information Technology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

0d561979f0f4bc6127cfcfe9c46ee205-Supplemental-Conference.pdf

Neural Information Processing SystemsMay-1-2026, 01:40:39 GMT

artificial intelligence, machine learning, sample complexity, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
Information Technology > Artificial Intelligence > Machine Learning (0.46)

Add feedback

Verification Based Solution for Structured MAB Problems

Zohar S. Karnin

Neural Information Processing SystemsApr-30-2026, 21:38:54 GMT

We consider the problem of finding the best arm in a stochastic Multi-armed Bandit (MAB) game and propose a general framework based on verification that applies to multiple well-motivated generalizations of the classic MAB problem. In these generalizations, additional structure is known in advance, causing the task of verifying the optimality of a candidate to be easier than discovering the best arm. Our results are focused on the scenario where the failure probability must be very low; we essentially show that in this high confidence regime, identifying the best arm is as easy as the task of verification. We demonstrate the effectiveness of our framework by applying it, and matching or improving the state-of-the art results in the problems of: Linear bandits, Dueling bandits with the Condorcet assumption, Copeland dueling bandits, Unimodal bandits and Graphical bandits.

data mining, information retrieval, machine learning, (17 more...)

Neural Information Processing Systems

Country: Europe (0.46)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.52)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.32)

Add feedback

Multi-Agent Learning with Heterogeneous Linear Contextual Bandits

Neural Information Processing SystemsApr-30-2026, 08:52:32 GMT

As trained intelligent systems become increasingly pervasive, multi-agent learning has emerged as a popular framework for studying complex interactions between autonomous agents. Yet, a formal understanding of how and when learners in heterogeneous environments benefit from sharing their respective experiences is still in its infancy. In this paper, we seek answers to these questions in the context of linear contextual bandits. We present a novel distributed learning algorithm based on the upper confidence bound (UCB) algorithm, which we refer to as H-LINUCB, wherein agents cooperatively minimize the group regret under the coordination of a central server. In the setting where the level of heterogeneity or dissimilarity across the environments is known to the agents, we show that H-LINUCB is provably optimal in regimes where the tasks are highly similar or highly dissimilar.

artificial intelligence, bandit, machine learning, (17 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Genre: Research Report (0.46)

Technology: