linucb
Spectral bandits for smooth graph functions
Valko, Michal, Munos, Rémi, Kveton, Branislav, Kocák, Tomáš
Smooth functions on graphs have wide applications in manifold and semi-supervised learning. In this paper, we study a bandit problem where the payoffs of arms are smooth on a graph. This framework is suitable for solving online learning problems that involve graphs, such as content-based recommendation. In this problem, each item we can recommend is a node and its expected rating is similar to its neighbors. The goal is to recommend items that have high expected ratings. We aim for the algorithms where the cumulative regret with respect to the optimal policy would not scale poorly with the number of nodes. In particular, we introduce the notion of an effective dimension, which is small in real-world graphs, and propose two algorithms for solving our problem that scale linearly and sublinearly in this dimension. Our experiments on real-world content recommendation problem show that a good estimator of user preferences for thousands of items can be learned from just tens of nodes evaluations.
- Europe > France (0.05)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England (0.04)
- (2 more...)
- North America > United States (0.04)
- North America > Canada > Nova Scotia > Halifax Regional Municipality > Halifax (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Japan > Kyūshū & Okinawa > Kyūshū > Fukuoka Prefecture > Fukuoka (0.04)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.49)
- North America > United States (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Statistical Inference under Adaptive Sampling with LinUCB
Fan, Wei, Tan, Kevin, Wei, Yuting
Adaptively collected data has become ubiquitous within modern practice. However, even seemingly benign adaptive sampling schemes can introduce severe biases, rendering traditional statistical inference tools inapplicable. This can be mitigated by a property called stability, which states that if the rate at which an algorithm takes actions converges to a deterministic limit, one can expect that certain parameters are asymptotically normal. Building on a recent line of work for the multi-armed bandit setting, we show that the linear upper confidence bound (LinUCB) algorithm for linear bandits satisfies this property. In doing so, we painstakingly characterize the behavior of the eigenvalues and eigenvectors of the random design feature covariance matrix in the setting where the action set is the unit ball, showing that it decomposes into a rank-one direction that locks onto the true parameter and an almost-isotropic bulk that grows at a predictable $\sqrt{T}$ rate. This allows us to establish a central limit theorem for the LinUCB algorithm, establishing asymptotic normality for the limiting distribution of the estimation error where the convergence occurs at a $T^{-1/4}$ rate. The resulting Wald-type confidence sets and hypothesis tests do not depend on the feature covariance matrix and are asymptotically tighter than existing nonasymptotic confidence sets. Numerical simulations corroborate our findings.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Pennsylvania (0.04)
- Workflow (0.93)
- Research Report > New Finding (0.65)
Review for NeurIPS paper: Adversarial Attacks on Linear Contextual Bandits
Summary and Contributions: Summary & Contributions: Authors study the scopes of adversarial attacks in linear contextual bandit algorithms which have applications in a wide range of domains. The authors consider adversarial attacks on both reward and the context and analyze the robustness (or lack of it) of various contextual linear bandit algorithms including LinUCB, LinTS, epsilon-greedy etc. Empirical evaluations are presented on various synthetic and two real datasets to examine the effect of attacks on these algorithms. Strengths: The problem of analyzing the effect of adversarial attacks on bandit algorithms are indeed interesting and well motivated, and present work is supposedly the first one to analyze this for stochastic contextual linear bandits. Authors also analyze some popularly studied bandit algorithms, like LinUCB, LinTS, epsilon-greedy, and showed the attacking strategies (as optimization problems) to fool above algorithms for playing some targeted suboptimal arm majority number of times. Experiments are fairly detailed and reported on a large set of datasets showing the effect on learning rate of existing techniques on different degree+type of attacks.
- Information Technology > Security & Privacy (1.00)
- Government > Military (1.00)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Artificial Intelligence (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.98)
Scalable LinUCB: Low-Rank Design Matrix Updates for Recommenders with Large Action Spaces
Shustova, Evgenia, Sheshukova, Marina, Samsonov, Sergey, Frolov, Evgeny
Linear contextual bandits, especially LinUCB, are widely used in recommender systems. However, its training, inference, and memory costs grow with feature dimensionality and the size of the action space. The key bottleneck becomes the need to update, invert and store a design matrix that absorbs contextual information from interaction history. In this paper, we introduce Scalable LinUCB, the algorithm that enables fast and memory efficient operations with the inverse regularized design matrix. We achieve this through a dynamical low-rank parametrization of its inverse Cholesky-style factors. We derive numerically stable rank-1 and batched updates that maintain the inverse without directly forming the entire matrix. To control memory growth, we employ a projector-splitting integrator for dynamical low-rank approximation, yielding average per-step update cost $O(dr)$ and memory $O(dr)$ for approximation rank $r$. Inference complexity of the suggested algorithm is $O(dr)$ per action evaluation. Experiments on recommender system datasets demonstrate the effectiveness of our algorithm.
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
- Asia > Russia (0.04)
- Asia > Singapore (0.04)
- (7 more...)
problems to be compatible with multiple adversarial bandit algorithms which allows us to obtain previously unattainable
We would like to thank the reviewers for their time. This is also the first result in literature that can combine multiple types of model selection. "The selection of the range": The regret is multiplied by at most a factor of the number of bases Thank you for your comments. B) We would present the full description of the algorithm before Section 4. C) "The assumption that Base algorithms only have access to rewards of rounds when they are selected...": This is not A) "I think it's not trivial to reproduce the results...": We would like more explanations as to why it would We believe we have provided all the details for our experiments. B) "...the instantaneous regret of Step 2 is 1/s times...": Let the cumulative regret of step 1 at round More details are in the proof in Lemma F1, page 24.
- North America > United States (0.04)
- North America > Canada > Nova Scotia > Halifax Regional Municipality > Halifax (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Japan > Kyūshū & Okinawa > Kyūshū > Fukuoka Prefecture > Fukuoka (0.04)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.49)