AITopics | dueling bandit problem

Collaborating Authors

dueling bandit problem

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Double Thompson Sampling for Dueling Bandits

Huasen Wu, Xin Liu

Neural Information Processing SystemsMay-1-2026, 06:05:22 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, data mining, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Europe (0.28)
North America > United States > California (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.49)

Add feedback

Verification Based Solution for Structured MAB Problems

Zohar S. Karnin

Neural Information Processing SystemsApr-30-2026, 21:38:54 GMT

We consider the problem of finding the best arm in a stochastic Multi-armed Bandit (MAB) game and propose a general framework based on verification that applies to multiple well-motivated generalizations of the classic MAB problem. In these generalizations, additional structure is known in advance, causing the task of verifying the optimality of a candidate to be easier than discovering the best arm. Our results are focused on the scenario where the failure probability must be very low; we essentially show that in this high confidence regime, identifying the best arm is as easy as the task of verification. We demonstrate the effectiveness of our framework by applying it, and matching or improving the state-of-the art results in the problems of: Linear bandits, Dueling bandits with the Condorcet assumption, Copeland dueling bandits, Unimodal bandits and Graphical bandits.

data mining, information retrieval, machine learning, (17 more...)

Neural Information Processing Systems

Country: Europe (0.46)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.52)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.32)

Add feedback

Dueling Bandits: Beyond Condorcet Winners to General Tournament Solutions

Siddartha Y. Ramamohan, Arun Rajkumar, Shivani Agarwal, Shivani Agarwal

Neural Information Processing SystemsApr-22-2026, 14:34:06 GMT

Recent work on deriving O(log T) anytime regret bounds for stochastic dueling bandit problems has considered mostly Condorcet winners, which do not always exist, and more recently, winners defined by the Copeland set, which do always exist. In this work, we consider a broad notion of winners defined by tournament solutions in social choice theory, which include the Copeland set as a special case but also include several other notions of winners such as the top cycle, uncovered set, and Banks set, and which, like the Copeland set, always exist. We develop a family of UCB-style dueling bandit algorithms for such general tournament solutions, and show O(log T) anytime regret bounds for them. Experiments confirm the ability of our algorithms to achieve low regret relative to the target winning set of interest.

artificial intelligence, data mining, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania (0.28)
Europe (0.28)

Industry:

Government > Voting & Elections (0.64)
Leisure & Entertainment > Sports > Tennis (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.71)
Information Technology > Artificial Intelligence > Cognitive Science (0.69)

Add feedback

5e388103a391daabe3de1d76a6739ccd-Paper.pdf

Neural Information Processing SystemsFeb-19-2026, 14:28:41 GMT

algorithm, subset, winner-regret, (12 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
Asia > India > Karnataka > Bengaluru (0.04)
North America > Canada (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Data Science > Data Mining > Big Data (0.47)

Add feedback

78ccee9dfbcf84840165ab4093715969-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-15-2026, 01:39:08 GMT

artificial intelligence, data mining, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Maryland (0.04)
North America > United States > Arizona > Maricopa County > Phoenix (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
(2 more...)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.47)

Add feedback

Factored Bandits

Julian Zimmert, Yevgeny Seldin

Neural Information Processing SystemsFeb-12-2026, 10:26:08 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, assumption, bandit, (15 more...)

Neural Information Processing Systems

Country:

Europe > Denmark > Capital Region > Copenhagen (0.04)
North America > Canada > Quebec > Montreal (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.72)

Add feedback

On Weak Regret Analysis for Dueling Bandits

Neural Information Processing SystemsFeb-12-2026, 08:34:19 GMT

When the optimality gap is negligible, we propose another algorithm that outperforms our first algorithm, highlighting the subtlety of this dueling bandit problem.

data mining, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Brandenburg > Potsdam (0.04)
North America > United States (0.04)
Europe > France > Occitanie > Hérault > Montpellier (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Leisure & Entertainment (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.67)
Information Technology > Data Science > Data Mining > Big Data (0.49)

Add feedback

7873_an_asymptotically_optimal_batc (1)

Rohan Ghuge

Neural Information Processing SystemsFeb-11-2026, 14:27:59 GMT

algorithm, bandit problem, proceedings, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Michigan (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(6 more...)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Data Science > Data Mining > Big Data (0.49)

Add feedback

An Asymptotically Optimal Batched Algorithm for the Dueling Bandit Problem

Neural Information Processing SystemsDec-25-2025, 02:51:08 GMT

We study the $K$-armed dueling bandit problem, a variation of the traditional multi-armed bandit problem in which feedback is obtained in the form of pairwise comparisons. Previous learning algorithms have focused on the fully adaptive setting, where the algorithm can make updates after every comparison. The batched dueling bandit problem is motivated by large-scale applications like web search ranking and recommendation systems, where performing sequential updates may be infeasible. In this work, we ask: is there a solution using only a few adaptive rounds that matches the asymptotic regret bounds of the best sequential algorithms for $K$-armed dueling bandits? We answer this in the affirmative under the Condorcet condition, a standard setting of the $K$-armed dueling bandit problem.

asymptotically optimal batched algorithm, dueling bandit problem, name change, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence (1.00)

Add feedback

Filters

Collaborating Authors

dueling bandit problem

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Double Thompson Sampling for Dueling Bandits

Verification Based Solution for Structured MAB Problems

Dueling Bandits: Beyond Condorcet Winners to General Tournament Solutions

5e388103a391daabe3de1d76a6739ccd-Paper.pdf

78ccee9dfbcf84840165ab4093715969-Supplemental-Conference.pdf

Factored Bandits

On Weak Regret Analysis for Dueling Bandits

e97ee2054defb209c35fe4dc94599061-Supplemental.pdf

7873_an_asymptotically_optimal_batc (1)

An Asymptotically Optimal Batched Algorithm for the Dueling Bandit Problem