linucb
- North America > United States (0.04)
- North America > Canada > Nova Scotia > Halifax Regional Municipality > Halifax (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Japan > Kyūshū & Okinawa > Kyūshū > Fukuoka Prefecture > Fukuoka (0.04)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.49)
- North America > United States (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Statistical Inference under Adaptive Sampling with LinUCB
Fan, Wei, Tan, Kevin, Wei, Yuting
Adaptively collected data has become ubiquitous within modern practice. However, even seemingly benign adaptive sampling schemes can introduce severe biases, rendering traditional statistical inference tools inapplicable. This can be mitigated by a property called stability, which states that if the rate at which an algorithm takes actions converges to a deterministic limit, one can expect that certain parameters are asymptotically normal. Building on a recent line of work for the multi-armed bandit setting, we show that the linear upper confidence bound (LinUCB) algorithm for linear bandits satisfies this property. In doing so, we painstakingly characterize the behavior of the eigenvalues and eigenvectors of the random design feature covariance matrix in the setting where the action set is the unit ball, showing that it decomposes into a rank-one direction that locks onto the true parameter and an almost-isotropic bulk that grows at a predictable $\sqrt{T}$ rate. This allows us to establish a central limit theorem for the LinUCB algorithm, establishing asymptotic normality for the limiting distribution of the estimation error where the convergence occurs at a $T^{-1/4}$ rate. The resulting Wald-type confidence sets and hypothesis tests do not depend on the feature covariance matrix and are asymptotically tighter than existing nonasymptotic confidence sets. Numerical simulations corroborate our findings.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Pennsylvania (0.04)
- Workflow (0.93)
- Research Report > New Finding (0.65)
Review for NeurIPS paper: Adversarial Attacks on Linear Contextual Bandits
Summary and Contributions: Summary & Contributions: Authors study the scopes of adversarial attacks in linear contextual bandit algorithms which have applications in a wide range of domains. The authors consider adversarial attacks on both reward and the context and analyze the robustness (or lack of it) of various contextual linear bandit algorithms including LinUCB, LinTS, epsilon-greedy etc. Empirical evaluations are presented on various synthetic and two real datasets to examine the effect of attacks on these algorithms. Strengths: The problem of analyzing the effect of adversarial attacks on bandit algorithms are indeed interesting and well motivated, and present work is supposedly the first one to analyze this for stochastic contextual linear bandits. Authors also analyze some popularly studied bandit algorithms, like LinUCB, LinTS, epsilon-greedy, and showed the attacking strategies (as optimization problems) to fool above algorithms for playing some targeted suboptimal arm majority number of times. Experiments are fairly detailed and reported on a large set of datasets showing the effect on learning rate of existing techniques on different degree+type of attacks.
- Information Technology > Security & Privacy (1.00)
- Government > Military (1.00)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Artificial Intelligence (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.98)
Scalable LinUCB: Low-Rank Design Matrix Updates for Recommenders with Large Action Spaces
Shustova, Evgenia, Sheshukova, Marina, Samsonov, Sergey, Frolov, Evgeny
Linear contextual bandits, especially LinUCB, are widely used in recommender systems. However, its training, inference, and memory costs grow with feature dimensionality and the size of the action space. The key bottleneck becomes the need to update, invert and store a design matrix that absorbs contextual information from interaction history. In this paper, we introduce Scalable LinUCB, the algorithm that enables fast and memory efficient operations with the inverse regularized design matrix. We achieve this through a dynamical low-rank parametrization of its inverse Cholesky-style factors. We derive numerically stable rank-1 and batched updates that maintain the inverse without directly forming the entire matrix. To control memory growth, we employ a projector-splitting integrator for dynamical low-rank approximation, yielding average per-step update cost $O(dr)$ and memory $O(dr)$ for approximation rank $r$. Inference complexity of the suggested algorithm is $O(dr)$ per action evaluation. Experiments on recommender system datasets demonstrate the effectiveness of our algorithm.
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
- Asia > Russia (0.04)
- Asia > Singapore (0.04)
- (7 more...)
problems to be compatible with multiple adversarial bandit algorithms which allows us to obtain previously unattainable
We would like to thank the reviewers for their time. This is also the first result in literature that can combine multiple types of model selection. "The selection of the range": The regret is multiplied by at most a factor of the number of bases Thank you for your comments. B) We would present the full description of the algorithm before Section 4. C) "The assumption that Base algorithms only have access to rewards of rounds when they are selected...": This is not A) "I think it's not trivial to reproduce the results...": We would like more explanations as to why it would We believe we have provided all the details for our experiments. B) "...the instantaneous regret of Step 2 is 1/s times...": Let the cumulative regret of step 1 at round More details are in the proof in Lemma F1, page 24.
- North America > United States (0.04)
- North America > Canada > Nova Scotia > Halifax Regional Municipality > Halifax (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Japan > Kyūshū & Okinawa > Kyūshū > Fukuoka Prefecture > Fukuoka (0.04)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.49)
Decentralized Contextual Bandits with Network Adaptivity
We consider contextual linear bandits over networks, a class of sequential decision-making problems where learning occurs simultaneously across multiple locations and the reward distributions share structural similarities while also exhibiting local differences. While classical contextual bandits assume either fully centralized data or entirely isolated learners, much remains unexplored in networked environments when information is partially shared. In this paper, we address this gap by developing two network-aware Upper Confidence Bound (UCB) algorithms, NetLinUCB and Net-SGD-UCB, which enable adaptive information sharing guided by dynamically updated network weights. Our approach decompose learning into global and local components and as a result allow agents to benefit from shared structure without full synchronization. Both algorithms incur lighter communication costs compared to a fully centralized setting as agents only share computed summaries regarding the homogeneous features. We establish regret bounds showing that our methods reduce the learning complexity associated with the shared structure from $O(N)$ to sublinear $O(\sqrt{N})$, where $N$ is the size of the network. The two algorithms reveal complementary strengths: NetLinUCB excels in low-noise regimes with fine-grained heterogeneity, while Net-SGD-UCB is robust to high-dimensional, high-variance contexts. We further demonstrate the effectiveness of our methods across simulated pricing environments compared to standard benchmarks.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- (2 more...)
Online Multi-LLM Selection via Contextual Bandits under Unstructured Context Evolution
Poon, Manhin, Dai, XiangXiang, Liu, Xutong, Kong, Fang, Lui, John C. S., Zuo, Jinhang
Large language models (LLMs) exhibit diverse response behaviors, costs, and strengths, making it challenging to select the most suitable LLM for a given user query. We study the problem of adaptive multi-LLM selection in an online setting, where the learner interacts with users through multi-step query refinement and must choose LLMs sequentially without access to offline datasets or model internals. A key challenge arises from unstructured context evolution: the prompt dynamically changes in response to previous model outputs via a black-box process, which cannot be simulated, modeled, or learned. To address this, we propose the first contextual bandit framework for sequential LLM selection under unstructured prompt dynamics. We formalize a notion of myopic regret and develop a LinUCB-based algorithm that provably achieves sublinear regret without relying on future context prediction. We further introduce budget-aware and positionally-aware (favoring early-stage satisfaction) extensions to accommodate variable query costs and user preferences for early high-quality responses. Our algorithms are theoretically grounded and require no offline fine-tuning or dataset-specific training. Experiments on diverse benchmarks demonstrate that our methods outperform existing LLM routing strategies in both accuracy and cost-efficiency, validating the power of contextual bandits for real-time, adaptive LLM selection.
- Asia > China > Hong Kong (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)