Online Learning in Contextual Bandits using Gated Linear Networks
Sezener, Eren, Hutter, Marcus, Budden, David, Wang, Jianan, Veness, Joel
–arXiv.org Artificial Intelligence
We introduce a new and completely online contextual bandit algorithm called Gated Linear Contextual Bandits (GLCB). This algorithm is based on Gated Linear Networks (GLNs), a recently introduced deep learning architecture with properties well-suited to the online setting. Leveraging data-dependent gating properties of the GLN we are able to estimate prediction uncertainty with effectively zero algorithmic overhead. We empirically evaluate GLCB compared to 9 state-of-the-art algorithms that leverage deep neural networks, on a standard benchmark suite of discrete and continuous contextual bandit problems. GLCB obtains median first-place despite being the only online method, and we further support these results with a theoretical study of its convergence properties.
arXiv.org Artificial Intelligence
Feb-21-2020
- Country:
- Asia > Afghanistan
- Parwan Province > Charikar (0.04)
- Europe > France
- Hauts-de-France > Nord > Lille (0.04)
- North America > United States
- District of Columbia > Washington (0.04)
- New York > New York County
- New York City (0.14)
- Texas > Travis County
- Austin (0.04)
- Asia > Afghanistan
- Genre:
- Research Report (0.64)
- Industry:
- Education > Educational Setting > Online (0.51)
- Technology: