an Efficient Bandit algorithm for Online Multiclass Prediction

Neural Information Processing Systems 

We present an efficient algorithm for the problem of online multiclass prediction with bandit feedback in the fully adversarial setting. We measure its regret with respect to the log-loss defined in [AR09], which is parameterized by a scalar α.