Efficient Online Bandit Multiclass Learning with $\tilde{O}(\sqrt{T})$ Regret
Beygelzimer, Alina, Orabona, Francesco, Zhang, Chicheng
We present an efficient second-order algorithm with $\tilde{O}(\frac{1}{\eta}\sqrt{T})$ regret for the bandit online multiclass problem. The regret bound holds simultaneously with respect to a family of loss functions parameterized by $\eta$, for a range of $\eta$ restricted by the norm of the competitor. The family of loss functions ranges from hinge loss ($\eta=0$) to squared hinge loss ($\eta=1$). This provides a solution to the open problem of (J. Abernethy and A. Rakhlin. An efficient bandit algorithm for $\sqrt{T}$-regret in online multiclass prediction? In COLT, 2009). We test our algorithm experimentally, showing that it also performs favorably against earlier algorithms.
Jan-17-2018
- Country:
- North America > United States
- New York
- Suffolk County > Stony Brook (0.04)
- New York County > New York City (0.04)
- California > San Diego County
- New York
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America > United States
- Genre:
- Research Report (0.50)
- Industry:
- Education > Educational Setting > Online (0.46)
- Technology: