AITopics | multi-armed bandit algorithm

Collaborating Authors

multi-armed bandit algorithm

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

OnlineMulti-ArmedBanditswithAdaptiveInference

Neural Information Processing SystemsFeb-7-2026, 12:26:49 GMT

During online decision making in multi-armed bandits, one needs to conduct inference on the true mean reward of each arm based on data collected so far at each step.

artificial intelligence, data mining, machine learning, (19 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.50)
Information Technology > Artificial Intelligence > Machine Learning (0.47)

Add feedback

A Closer Look at the Worst-case Behavior of Multi-armed Bandit Algorithms

Neural Information Processing SystemsDec-24-2025, 02:02:44 GMT

One of the key drivers of complexity in the classical (stochastic) multi-armed bandit (MAB) problem is the difference between mean rewards in the top two arms, also known as the instance gap. The celebrated Upper Confidence Bound (UCB) policy is among the simplest optimism-based MAB algorithms that naturally adapts to this gap: for a horizon of play n, it achieves optimal O(log n) regret in instances with large gaps, and a near-optimal O(\sqrt{n log n}) minimax regret when the gap can be arbitrarily small. This paper provides new results on the arm-sampling behavior of UCB, leading to several important insights. Among these, it is shown that arm-sampling rates under UCB are asymptotically deterministic, regardless of the problem complexity.

multi-armed bandit algorithm, name change, worst-case behavior, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.43)
Information Technology > Artificial Intelligence (0.43)

Add feedback

Estimation Bias in Multi-Armed Bandit Algorithms for Search Advertising

Neural Information Processing SystemsSep-30-2025, 12:07:44 GMT

In search advertising, the search engine needs to select the most profitable advertisements to display, which can be formulated as an instance of online learning with partial feedback, also known as the stochastic multi-armed bandit (MAB) problem. In this paper, we show that the naive application of MAB algorithms to search advertising for advertisement selection will produce sample selection bias that harms the search engine by decreasing expected revenue and "estimation of the largest mean" (ELM) bias that harms the advertisers by increasing game-theoretic player-regret. We then propose simple bias-correction methods with benefits to both the search engine and the advertisers.

estimation bias, multi-armed bandit algorithm, name change, (4 more...)

Neural Information Processing Systems

Industry:

Marketing (1.00)
Information Technology > Services (0.94)

Technology:

Information Technology > Artificial Intelligence (0.92)
Information Technology > Enterprise Applications (0.63)
Information Technology > Data Science > Data Mining > Big Data (0.47)

Add feedback

BanditMF: Multi-Armed Bandit Based Matrix Factorization Recommender System

Xu, Shenghao

arXiv.org Artificial IntelligenceOct-24-2022

Multi-armed bandits (MAB) provide a principled online learning approach to attain the balance between exploration and exploitation. Due to the superior performance and low feedback learning without the learning to act in multiple situations, Multi-armed Bandits drawing widespread attention in applications ranging such as recommender systems. Likewise, within the recommender system, collaborative filtering (CF) is arguably the earliest and most influential method in the recommender system. Crucially, new users and an ever-changing pool of recommended items are the challenges that recommender systems need to address. For collaborative filtering, the classical method is training the model offline, then perform the online testing, but this approach can no longer handle the dynamic changes in user preferences which is the so-called cold start. So how to effectively recommend items to users in the absence of effective information? To address the aforementioned problems, a multi-armed bandit based collaborative filtering recommender system has been proposed, named BanditMF. BanditMF is designed to address two challenges in the multi-armed bandits algorithm and collaborative filtering: (1) how to solve the cold start problem for collaborative filtering under the condition of scarcity of valid information, (2) how to solve the sub-optimal problem of bandit algorithms in strong social relations domains caused by independently estimating unknown parameters associated with each user and ignoring correlations between users.

algorithm, artificial intelligence, data mining, (16 more...)

arXiv.org Artificial Intelligence

2106.10898

Genre:

Workflow (0.67)
Research Report (0.64)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)

Add feedback

Online Learning in Iterated Prisoner's Dilemma to Mimic Human Behavior

Lin, Baihan, Bouneffouf, Djallel, Cecchi, Guillermo

arXiv.org Artificial IntelligenceSep-10-2020

Prisoner's Dilemma mainly treat the choice to cooperate or defect as an atomic action. We propose to study online learning algorithm behavior in the Iterated Prisoner's Dilemma (IPD) game, where we explored the full spectrum of reinforcement learning agents: multi-armed bandits, contextual bandits and reinforcement learning. We have evaluate them based on a tournament of iterated prisoner's dilemma where multiple agents can compete in a sequential fashion. This allows us to analyze the dynamics of policies learned by multiple self-interested independent reward-driven agents, and also allows us study the capacity of these algorithms to fit the human behaviors. Results suggest that considering the current situation to make decision is the worst in this kind of social dilemma game. Multiples discoveries on online learning behaviors and clinical validations are stated.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2006.0658

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Michigan (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Leisure & Entertainment > Games (0.93)
Education > Educational Setting > Online (0.82)
(2 more...)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Estimation Bias in Multi-Armed Bandit Algorithms for Search Advertising

Xu, Min, Qin, Tao, Liu, Tie-Yan

Neural Information Processing SystemsMar-19-2020, 09:46:35 GMT

estimation bias, multi-armed bandit algorithm, search advertising, (2 more...)

Neural Information Processing Systems

Industry:

Marketing (1.00)
Information Technology > Services (0.96)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence (1.00)

Add feedback

How to Increase Email Course Open Rate with Machine Learning

#artificialintelligenceApr-3-2017, 03:47:05 GMT

I think we can all agree on the fact that split testing is an effective method to find out what works best and get more out of your existing traffic. It's extremely useful since it can be applied to a number of different things: subject lines and content of emails, landing pages, home pages, creatives for ads and the list goes on. Also, you can find articles, case studies on split testing for almost anything, except drip campaigns. It's because experimenting with automated emails takes a lot of time and preparation. It is nearly impossible and too technical with most marketing automation tools.

drip campaign, increase email course open rate, split testing, (10 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.72)

Add feedback

Mortal Multi-Armed Bandits

Chakrabarti, Deepayan, Kumar, Ravi, Radlinski, Filip, Upfal, Eli

Neural Information Processing SystemsDec-31-2009

We formulate and study a new variant of the $k$-armed bandit problem, motivated by e-commerce applications. In our model, arms have (stochastic) lifetime after which they expire. In this setting an algorithm needs to continuously explore new arms, in contrast to the standard $k$-armed bandit model in which arms are available indefinitely and exploration is reduced once an optimal arm is identified with near-certainty. The main motivation for our setting is online-advertising, where ads have limited lifetime due to, for example, the nature of their content and their campaign budget. An algorithm needs to choose among a large collection of ads, more than can be fully explored within the ads' lifetime. We present an optimal algorithm for the state-aware (deterministic reward function) case, and build on this technique to obtain an algorithm for the state-oblivious (stochastic reward function) case. Empirical studies on various reward distributions, including one derived from a real-world ad serving application, show that the proposed algorithms significantly outperform the standard multi-armed bandit approaches applied to these settings.

artificial intelligence, big data, data mining, (20 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Industry:

Marketing (0.89)
Information Technology > Services (0.54)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence (1.00)

Add feedback

Nearly Tight Bounds for the Continuum-Armed Bandit Problem

Kleinberg, Robert D.

Neural Information Processing SystemsDec-31-2005

In the multi-armed bandit problem, an online algorithm must choose from a set of strategies in a sequence of n trials so as to minimize the total cost of the chosen strategies. While nearly tight upper and lower bounds are known in the case when the strategy set is finite, much less is known when there is an infinite strategy set.

algorithm, bandit problem, cost function, (15 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Technology: