Randomized Exploration in Generalized Linear Bandits

Kveton, Branislav, Zaheer, Manzil, Szepesvari, Csaba, Li, Lihong, Ghavamzadeh, Mohammad, Boutilier, Craig

Jun-21-2019–arXiv.org Machine Learning

We study two randomized algorithms for generalized linear bandits, GLM-TSL and GLM-FPL. GLM-TSL samples a generalized linear model (GLM) from the Laplace approximation to the posterior distribution. GLM-FPL, a new algorithm proposed in this work, fits a GLM to a randomly perturbed history of past rewards. We prove a $\tilde{O}(d \sqrt{n} + d^2)$ upper bound on the $n$-round regret of GLM-TSL, where $d$ is the number of features. This is the first regret bound of a Thompson sampling-like algorithm in GLM bandits where the leading term is $\tilde{O}(d \sqrt{n})$. We apply both GLM-TSL and GLM-FPL to logistic and neural network bandits, and show that they perform well empirically. In more complex models, GLM-FPL is significantly faster. Our results showcase the role of randomization, beyond posterior sampling, in exploration.

artificial intelligence, bandit, neural network, (18 more...)

arXiv.org Machine Learning

Jun-21-2019

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.14)

Genre:
- Research Report > New Finding (0.48)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found