Reviews: Aggressive Sampling for Multi-class to Binary Reduction with Applications to Text Classification

Neural Information Processing Systems 

Summary: This paper proposes a new reduction from multi-class classification to binary classification that is especially suitable when the number of classes is very large. They consider a hypothesis that map (input,class) pairs to scores, and the underlying loss function counts the fraction of the wrong classes that are scored higher than the true class. More specifically, they suppose they have a feature transformation phi that maps (input,class) pairs to a p-dimensional feature space, and they learn a mapping from R p to scores. Their reduction extends the work of Joshi et al. (2015) which, for each data point (x,y), creates K-1 transformed points where each transformed point intuitively corresponds to the comparison of label y with some incorrect label y'. Given that the transformed dataset contains correlated training examples, many standard generalization bounds cannot be applied.