AITopics | contextual bandit algorithm

Collaborating Authors

contextual bandit algorithm

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

082e82cae0232f45f27fdd2612c31f8a-Paper-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 10:30:51 GMT

artificial intelligence, data mining, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.68)

Genre: Research Report (0.68)

Industry: Information Technology (0.46)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

Fairness in Learning: Classic and Contextual Bandits

Matthew Joseph, Michael Kearns, Jamie H. Morgenstern, Aaron Roth

Neural Information Processing SystemsApr-22-2026, 12:12:56 GMT

We introduce the study of fairness in multi-armed bandit problems. Our fairness definition demands that, given a pool of applicants, a worse applicant is never favored over a better one, despite a learning algorithm's uncertainty over the true payoffs. In the classic stochastic bandits problem we provide a provably fair algorithm based on "chained" confidence intervals, and prove a cumulative regret bound with a cubic dependence on the number of arms. We further show that any fair algorithm must have such a dependence, providing a strong separation between fair and unfair learning that extends to the general contextual case. In the general contextual case, we prove a tight connection between fairness and the KWIK (Knows What It Knows) learning model: a KWIK algorithm for a class of functions can be transformed into a provably fair contextual bandit algorithm and vice versa. This tight connection allows us to provide a provably fair algorithm for the linear contextual bandit problem with a polynomial dependence on the dimension, and to show (for a different class of functions) a worst-case exponential gap in regret between fair and non-fair learning algorithms.

artificial intelligence, data mining, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.93)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

OfflineContextualBanditswith HighProbabilityFairnessGuarantees

Neural Information Processing SystemsFeb-14-2026, 10:54:06 GMT

To demonstrate the versatility of our approach, we use multiple well-known and custom definitions of fairness.

algorithm, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > United Kingdom > England (0.04)
(2 more...)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

c2d550cf3b2e177deb2d1720fb1e2710-Paper-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 18:47:31 GMT

algorithm, dataset, learner, (15 more...)

Neural Information Processing Systems

Country:

Europe > Norway (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Wisconsin (0.04)
(2 more...)

Genre: Research Report > New Finding (0.47)

Industry: Health & Medicine (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.94)
Information Technology > Communications (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.46)

Add feedback

OnlineLearninginContextualBandits usingGatedLinearNetworks

Neural Information Processing SystemsFeb-10-2026, 20:00:48 GMT

Leveraging data-dependent gating properties of the GLN we are able to estimate prediction uncertainty with effectively zero algorithmic overhead.

artificial intelligence, machine learning, neuron, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > District of Columbia > Washington (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Attack-Resistant Uniform Fairness for Linear and Smooth Contextual Bandits

Zhang, Qingwen, Wang, Wenjia

arXiv.org Machine LearningFeb-5-2026

Modern systems, such as digital platforms and service systems, increasingly rely on contextual bandits for online decision-making; however, their deployment can inadvertently create unfair exposure among arms, undermining long-term platform sustainability and supplier trust. This paper studies the contextual bandit problem under a uniform $(1-δ)$-fairness constraint, and addresses its unique vulnerabilities to strategic manipulation. The fairness constraint ensures that preferential treatment is strictly justified by an arm's actual reward across all contexts and time horizons, using uniformity to prevent statistical loopholes. We develop novel algorithms that achieve (nearly) minimax-optimal regret for both linear and smooth reward functions, while maintaining strong $(1-\tilde{O}(1/T))$-fairness guarantees, and further characterize the theoretically inherent yet asymptotically marginal "price of fairness". However, we reveal that such merit-based fairness becomes uniquely susceptible to signal manipulation. We show that an adversary with a minimal $\tilde{O}(1)$ budget can not only degrade overall performance as in traditional attacks, but also selectively induce insidious fairness-specific failures while leaving conspicuous regret measures largely unaffected. To counter this, we design robust variants incorporating corruption-adaptive exploration and error-compensated thresholding. Our approach yields the first minimax-optimal regret bounds under $C$-budgeted attack while preserving $(1-\tilde{O}(1/T))$-fairness. Numerical experiments and a real-world case demonstrate that our algorithms sustain both fairness and efficiency.

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Machine Learning

2602.04125

Country:

Asia > Singapore (0.40)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.67)

Industry:

Information Technology (0.46)
Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.67)

Add feedback

Syndicated Bandits: A Framework for Auto Tuning Hyper-parameters in Contextual Bandit Algorithms

Neural Information Processing SystemsDec-23-2025, 17:37:31 GMT

The stochastic contextual bandit problem, which models the trade-off between exploration and exploitation, has many real applications, including recommender systems, online advertising and clinical trials. As many other machine learning algorithms, contextual bandit algorithms often have one or more hyper-parameters. As an example, in most optimal stochastic contextual bandit algorithms, there is an unknown exploration parameter which controls the trade-off between exploration and exploitation. A proper choice of the hyper-parameters is essential for contextual bandit algorithms to perform well. However, it is infeasible to use offline tuning methods to select hyper-parameters in contextual bandit environment since there is no pre-collected dataset and the decisions have to be made in real time. To tackle this problem, we first propose a two-layer bandit structure for auto tuning the exploration parameter and further generalize it to the Syndicated Bandits framework which can learn multiple hyper-parameters dynamically in contextual bandit environment. We derive the regret bounds of our proposed Syndicated Bandits framework and show it can avoid its regret dependent exponentially in the number of hyper-parameters to be tuned. Moreover, it achieves optimal regret bounds under certain scenarios. Syndicated Bandits framework is general enough to handle the tuning tasks in many popular contextual bandit algorithms, such as LinUCB, LinTS, UCB-GLM, etc. Experiments on both synthetic and real datasets validate the effectiveness of our proposed framework.

auto tuning hyper-parameter, contextual bandit algorithm, syndicated bandit framework, (5 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.59)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Practical considerations when designing an online learning algorithm for an app-based mHealth intervention

Gonzalez, Rachel T, Abbott, Madeline R, Nallamothu, Brahmajee, Hummel, Scott, Dorsch, Michael, Dempsey, Walter

arXiv.org Machine LearningNov-13-2025

The ubiquitous nature of mobile health (mHealth) technology has expanded opportunities for the integration of reinforcement learning into traditional clinical trial designs, allowing researchers to learn individualized treatment policies during the study. LowSalt4Life 2 (LS4L2) is a recent trial aimed at reducing sodium intake among hypertensive individuals through an app-based intervention. A reinforcement learning algorithm, which was deployed in one of the trial arms, was designed to send reminder notifications to promote app engagement in contexts where the notification would be effective, i.e., when a participant is likely to open the app in the next 30-minute and not when prior data suggested reduced effectiveness. Such an algorithm can improve app-based mHealth interventions by reducing participant burden and more effectively promoting behavior change. We encountered various challenges during the implementation of the learning algorithm, which we present as a template to solving challenges in future trials that deploy reinforcement learning algorithms. We provide template solutions based on LS4L2 for solving the key challenges of (i) defining a relevant reward, (ii) determining a meaningful timescale for optimization, (iii) specifying a robust statistical model that allows for automation, (iv) balancing model flexibility with computational cost, and (v) addressing missing values in gradually collected data.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Machine Learning

2511.08719

Country: