glcb
Online Learning in Contextual Bandits using Gated Linear Networks
We introduce a new and completely online contextual bandit algorithm called Gated Linear Contextual Bandits (GLCB). This algorithm is based on Gated Linear Networks (GLNs), a recently introduced deep learning architecture with properties well-suited to the online setting. Leveraging data-dependent gating properties of the GLN we are able to estimate prediction uncertainty with effectively zero algorithmic overhead. We empirically evaluate GLCB compared to 9 state-of-the-art algorithms that leverage deep neural networks, on a standard benchmark suite of discrete and continuous contextual bandit problems. GLCB obtains mean first-place despite being the only online method, and we further support these results with a theoretical study of its convergence properties.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- (4 more...)
- North America > United States > New York > New York County > New York City (0.05)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > France > Hauts-de-France > Nord > Lille (0.04)
Online Learning in Contextual Bandits using Gated Linear Networks
We introduce a new and completely online contextual bandit algorithm called Gated Linear Contextual Bandits (GLCB). This algorithm is based on Gated Linear Networks (GLNs), a recently introduced deep learning architecture with properties well-suited to the online setting. Leveraging data-dependent gating properties of the GLN we are able to estimate prediction uncertainty with effectively zero algorithmic overhead. We empirically evaluate GLCB compared to 9 state-of-the-art algorithms that leverage deep neural networks, on a standard benchmark suite of discrete and continuous contextual bandit problems. GLCB obtains mean first-place despite being the only online method, and we further support these results with a theoretical study of its convergence properties.
Accounting for Gaussian Process Imprecision in Bayesian Optimization
Rodemann, Julian, Augustin, Thomas
Bayesian optimization (BO) with Gaussian processes (GP) as surrogate models is widely used to optimize analytically unknown and expensive-to-evaluate functions. In this paper, we propose Prior-mean-RObust Bayesian Optimization (PROBO) that outperforms classical BO on specific problems. First, we study the effect of the Gaussian processes' prior specifications on classical BO's convergence. We find the prior's mean parameters to have the highest influence on convergence among all prior components. In response to this result, we introduce PROBO as a generalization of BO that aims at rendering the method more robust towards prior mean parameter misspecification. This is achieved by explicitly accounting for GP imprecision via a prior near-ignorance model. At the heart of this is a novel acquisition function, the generalized lower confidence bound (GLCB). We test our approach against classical BO on a real-world problem from material science and observe PROBO to converge faster. Further experiments on multimodal and wiggly target functions confirm the superiority of our method.
- Europe > Austria > Vienna (0.14)
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
Online Learning in Contextual Bandits using Gated Linear Networks
Sezener, Eren, Hutter, Marcus, Budden, David, Wang, Jianan, Veness, Joel
We introduce a new and completely online contextual bandit algorithm called Gated Linear Contextual Bandits (GLCB). This algorithm is based on Gated Linear Networks (GLNs), a recently introduced deep learning architecture with properties well-suited to the online setting. Leveraging data-dependent gating properties of the GLN we are able to estimate prediction uncertainty with effectively zero algorithmic overhead. We empirically evaluate GLCB compared to 9 state-of-the-art algorithms that leverage deep neural networks, on a standard benchmark suite of discrete and continuous contextual bandit problems. GLCB obtains median first-place despite being the only online method, and we further support these results with a theoretical study of its convergence properties.
- North America > United States > New York > New York County > New York City (0.14)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- (2 more...)