have extended our empirical proof of maximal informativeness to k = 15

Neural Information Processing Systems 

We thank the reviewers for the thought-invoking questions and helpful comments on improving the manuscript. R1, R2, & R3: The LLW hinge loss is calibrated with respect to the 0-1 loss while the WW hinge loss is not. A: The LLW SVM performs worse for a reason unrelated to calibration. Doǧan et al. [2016] on their page 20 gave an explanation for the worse performance of all Hence, the poor performance of LLW is a consequence of using absolute margin. R2, R3 & R4: Why is consistency with respect to the ordered partition loss desirable?