AITopics | weak regret

Collaborating Authors

weak regret

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

On Weak Regret Analysis for Dueling Bandits

Neural Information Processing SystemsFeb-12-2026, 08:34:19 GMT

When the optimality gap is negligible, we propose another algorithm that outperforms our first algorithm, highlighting the subtlety of this dueling bandit problem.

data mining, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Brandenburg > Potsdam (0.04)
North America > United States (0.04)
Europe > France > Occitanie > Hérault > Montpellier (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Leisure & Entertainment (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.67)
Information Technology > Data Science > Data Mining > Big Data (0.49)

Add feedback

On Weak Regret Analysis for Dueling Bandits

Neural Information Processing SystemsOct-10-2025, 01:12:54 GMT

When the optimality gap is negligible, we propose another algorithm that outperforms our first algorithm, highlighting the subtlety of this dueling bandit problem.

algorithm, condorcet winner, weak regret, (14 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Brandenburg > Potsdam (0.04)
North America > United States (0.04)
Europe > France > Occitanie > Hérault > Montpellier (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Leisure & Entertainment (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.67)
Information Technology > Data Science > Data Mining > Big Data (0.49)

Add feedback

Beyond Numeric Awards: In-Context Dueling Bandits with LLM Agents

Xia, Fanzeng, Liu, Hao, Yue, Yisong, Li, Tongxin

arXiv.org Artificial IntelligenceJul-1-2024

In-context decision-making is an important capability of artificial general intelligence, which Large Language Models (LLMs) have effectively demonstrated in various scenarios. However, LLMs often face challenges when dealing with numerical contexts, and limited attention has been paid to evaluating their performance through preference feedback generated by the environment. This paper investigates the performance of LLMs as decision-makers in the context of Dueling Bandits (DB). We first evaluate the performance of LLMs by comparing GPT-3.5-Turbo, GPT-4, and GPT-4-Turbo against established DB algorithms. Our results reveal that LLMs, particularly GPT-4 Turbo, quickly identify the Condorcet winner, thus outperforming existing state-of-the-art algorithms in terms of weak regret. Nevertheless, LLMs struggle to converge even when explicitly prompted to do so, and are sensitive to prompt variations. To overcome these issues, we introduce an LLM-augmented algorithm, IF-Enhanced LLM, which takes advantage of both in-context decision-making capabilities of LLMs and theoretical guarantees inherited from classic DB algorithms. The design of such an algorithm sheds light on how to enhance trustworthiness for LLMs used in decision-making tasks where performance robustness matters. We show that IF-Enhanced LLM has theoretical guarantees on both weak and strong regret. Our experimental results validate that IF-Enhanced LLM is robust even with noisy and adversarial prompts.

algorithm, if-e nhanced llm, llm, (12 more...)

arXiv.org Artificial Intelligence

2407.01887

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > California (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report > New Finding (0.88)

Industry: Government (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Definition of Non-Stationary Bandits

Liu, Yueyang, Kuang, Xu, Van Roy, Benjamin

arXiv.org Artificial IntelligenceJul-28-2023

Despite the subject of non-stationary bandit learning having attracted much recent attention, we have yet to identify a formal definition of non-stationarity that can consistently distinguish non-stationary bandits from stationary ones. Prior work has characterized non-stationary bandits as bandits for which the reward distribution changes over time. We demonstrate that this definition can ambiguously classify the same bandit as both stationary and non-stationary; this ambiguity arises in the existing definition's dependence on the latent sequence of reward distributions. Moreover, the definition has given rise to two widely used notions of regret: the dynamic regret and the weak regret. These notions are not indicative of qualitative agent performance in some bandits. Additionally, this definition of non-stationary bandits has led to the design of agents that explore excessively. We introduce a formal definition of non-stationary bandits that resolves these issues. Our new definition provides a unified approach, applicable seamlessly to both Bayesian and frequentist formulations of bandits. Furthermore, our definition ensures consistent classification of two bandits offering agents indistinguishable experiences, categorizing them as either both stationary or both non-stationary. This advancement provides a more robust framework for non-stationary bandit learning.

bandit, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2302.12202

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)

Add feedback

Non-Stationary Dueling Bandits

Kolpaczki, Patrick, Bengs, Viktor, Hüllermeier, Eyke

arXiv.org Machine LearningFeb-2-2022

We study the non-stationary dueling bandits problem with $K$ arms, where the time horizon $T$ consists of $M$ stationary segments, each of which is associated with its own preference matrix. The learner repeatedly selects a pair of arms and observes a binary preference between them as feedback. To minimize the accumulated regret, the learner needs to pick the Condorcet winner of each stationary segment as often as possible, despite preference matrices and segment lengths being unknown. We propose the $\mathrm{Beat\, the\, Winner\, Reset}$ algorithm and prove a bound on its expected binary weak regret in the stationary case, which tightens the bound of current state-of-art algorithms. We also show a regret bound for the non-stationary case, without requiring knowledge of $M$ or $T$. We further propose and analyze two meta-algorithms, $\mathrm{DETECT}$ for weak regret and $\mathrm{Monitored\, Dueling\, Bandits}$ for strong regret, both based on a detection-window approach that can incorporate any dueling bandit algorithm as a black-box algorithm. Finally, we prove a worst-case lower bound for expected weak regret in the non-stationary case.

probability, time step, weak regret, (16 more...)

arXiv.org Machine Learning

2202.00935

Country:

North America > United States (0.04)
Europe > Germany > North Rhine-Westphalia (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)
Information Technology > Artificial Intelligence > Machine Learning (0.88)
Information Technology > Data Science > Data Mining > Big Data (0.69)

Add feedback

Memory-Constrained No-Regret Learning in Adversarial Bandits

Xu, Xiao, Zhao, Qing

arXiv.org Machine LearningFeb-26-2020

An adversarial bandit problem with memory constraints is studied where only the statistics of a subset of arms can be stored. A hierarchical learning policy that requires only a sublinear order of memory space in terms of the number of arms is developed. Its sublinear regret orders with respect to the time horizon are established for both weak regret and shifting regret. This work appears to be the first on memory-constrained bandit problems under the adversarial setting.

algorithm, memory-constrained no-regret learning, sequence, (10 more...)

arXiv.org Machine Learning

2002.11804

Country:

North America > United States (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.89)

Add feedback