Review for NeurIPS paper: Exploiting the Surrogate Gap in Online Multiclass Classification

Neural Information Processing Systems 

Summary and Contributions: This paper considers online multiclass classification, in both the full-information and bandit settings, where the objective is to minimize regret with respect to the hinge loss (or logistic loss or smooth hinge loss) of the best linear predictor in hindsight. The loss of the online algorithm is 0/1-loss, so "bandit feedback" (only learning the loss of the action chosen) corresponds to only learning if the prediction made was correct or not. The paper presents a new algorithm that has a particularly good combination of running time and regret bound in the bandit setting. One thing I was looking for but couldn't find (maybe I missed it) is a discussion of what makes multiclass special. The gap between loss functions, e.g., as given in Figure 1, holds for binary too.