AITopics | linear contextual bandit

Collaborating Authors

linear contextual bandit

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

An Improved Algorithm for Adversarial Linear Contextual Bandits via Reduction

Neural Information Processing SystemsJun-22-2026, 17:17:35 GMT

We present an efficient algorithm for linear contextual bandits with adversarial losses and stochastic action sets. Our approach reduces this setting to misspecification-robust adversarial linear bandits with fixed action sets. Without knowledge of the context distribution or access to a context simulator, the algorithm achieves eO(min{d2 T, p d3T logK})regret and runs in poly(d,C,T) time, where d is the feature dimension, C is an upper bound on the number of linear constraints defining the action set in each round, K is an upper bound on the number of actions in each round, and T is number of rounds. This resolves the open question by Liu et al. (2023) on whether one can obtain poly(d) T regret in polynomial time independent of the number of actions. For the important class of combinatorial bandits with adversarial losses and stochastic action sets where the action sets can be described by a polynomial number of linear constraints, our algorithm is the first to achieve poly(d) T regret in polynomial time, while no prior algorithm achieves even o(T) regret in polynomial time to our knowledge. When a simulator is available, the regret bound can be improved to eO(d L), where L is the cumulative loss of the best policy.

algorithm, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country: Europe > Netherlands (0.28)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Stochastic Linear Contextual Bandits with Bounded Noise: A Set-Membership Approach

Xu, Haonan, Li, Yingying

arXiv.org Machine LearningJun-19-2026

This paper considers stochastic linear contextual bandits (SLCB) with bounded reward noise. Existing works typically assume sub-Gaussian reward noise and bounded expected rewards, under which the optimal regret bound scales as $\tilde{O}(\sqrt{T})$ in terms of horizon $T$. However, in many applications, realized/observed rewards are also naturally bounded, implying bounded reward noise. Bounded noise is more informative than the sub-Gaussian condition but has not been leveraged explicitly in the SLCB literature. In this paper, we propose a novel algorithm SME-OFU by utilizing an uncertainty quantification method called set-membership estimation (SME) and applying the principle of optimism in the face of uncertainty (OFU). Our algorithm enjoys an improved regret bound $O(\log T)$. Notice that this does not contradict the existing optimal bound $\tilde{O}(\sqrt{T})$ for sub-Gaussian noise because bounded noise is a stronger condition. Finally, simulations show empirical improvements of SME-OFU over a benchmark algorithm designed for sub-Gaussian noise when the reward noise is bounded.

artificial intelligence, contextual bandit, machine learning, (19 more...)

arXiv.org Machine Learning

2606.20022

Country: North America > United States > Illinois (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.46)

Add feedback

Multi-Agent Learning with Heterogeneous Linear Contextual Bandits

Neural Information Processing SystemsApr-30-2026, 08:52:32 GMT

As trained intelligent systems become increasingly pervasive, multi-agent learning has emerged as a popular framework for studying complex interactions between autonomous agents. Yet, a formal understanding of how and when learners in heterogeneous environments benefit from sharing their respective experiences is still in its infancy. In this paper, we seek answers to these questions in the context of linear contextual bandits. We present a novel distributed learning algorithm based on the upper confidence bound (UCB) algorithm, which we refer to as H-LINUCB, wherein agents cooperatively minimize the group regret under the coordination of a central server. In the setting where the level of heterogeneity or dissimilarity across the environments is known to the agents, we show that H-LINUCB is provably optimal in regimes where the tasks are highly similar or highly dissimilar.

artificial intelligence, bandit, machine learning, (17 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

c4e380fb74dec9da9c7212e834657aa9-Paper-Conference.pdf

Neural Information Processing SystemsApr-29-2026, 17:03:35 GMT

artificial intelligence, communication cost, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States > Texas (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Noise-Adaptive Thompson Sampling for Linear Contextual Bandits

Neural Information Processing SystemsApr-27-2026, 01:58:01 GMT

Linear contextual bandits represent a fundamental class of models with numerous real-world applications, and it is critical to developing algorithms that can effectively manage noise with unknown variance, ensuring provable guarantees for both worst-case constant-variance noise and deterministic reward scenarios.

artificial intelligence, data mining, machine learning, (20 more...)

Neural Information Processing Systems

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)

Add feedback

Parallelizing Thompson Sampling

Neural Information Processing SystemsApr-26-2026, 00:26:03 GMT

How can we make use of information parallelism in online decision making problems while efficiently balancing the exploration-exploitation trade-off? In this paper, we introduce a batch Thompson Sampling framework for two canonical online decision making problems, namely, stochastic multi-arm bandit and linear contextual bandit with finitely many arms. Over a time horizon T, our batch Thompson Sampling policy achieves the same (asymptotic) regret bound of a fully sequential one while carrying out only O(log T) batch queries. To achieve this exponential reduction, i.e., reducing the number of interactions from T to O(log T), our batch policy dynamically determines the duration of each batch in order to balance the exploration-exploitation trade-off. We also demonstrate experimentally that dynamic batch allocation dramatically outperforms natural baselines such as static batch allocations.

artificial intelligence, data mining, machine learning, (19 more...)

Neural Information Processing Systems

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.30)

Add feedback

8dbd2780192078711c0f31e10a819031-Paper-Conference.pdf

Neural Information Processing SystemsMar-14-2026, 01:37:09 GMT

algorithm, decay rate, lac condition, (15 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > South Korea > Seoul > Seoul (0.04)
North America > United States > California (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Data Science > Data Mining > Big Data (0.46)

Add feedback

Noise-Adaptive Thompson Sampling for Linear Contextual Bandits

Neural Information Processing SystemsFeb-19-2026, 10:21:24 GMT

artificial intelligence, data mining, machine learning, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > Connecticut > New Haven County > New Haven (0.04)
Europe > France > Auvergne-Rhône-Alpes > Lyon > Lyon (0.04)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Data Science > Data Mining > Big Data (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

4a6824f8f137e78f18e73d9cfc1d22ed-Paper-Conference.pdf

Neural Information Processing SystemsFeb-19-2026, 10:21:21 GMT

algorithm, bandit, inequality, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Connecticut > New Haven County > New Haven (0.04)
Europe > France > Auvergne-Rhône-Alpes > Lyon > Lyon (0.04)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)

Add feedback

Strategic Linear Contextual Bandits

Neural Information Processing SystemsFeb-18-2026, 06:41:35 GMT

Recommendation algorithms that select the most relevant item for sequentially arriving users or queries have become vital for navigating the internet and its many online platforms.

artificial intelligence, data mining, machine learning, (18 more...)

Neural Information Processing Systems

Country: