Reviews: Supervised learning through the lens of compression

Neural Information Processing Systems 

Most of the results established in the paper would, in the special case of binary classification, trivially follow from the known upper and lower bounds on sample complexity based on the VC dimension. However, the results were not previously known for multiclass learning, and other general loss functions. The results for the 0-1 loss are not particularly surprising, but it is good to know that, for instance, in multiclass classification with the 0-1 loss, the complexity measure in the agnostic sample complexity is the same as that in the realizable-case (up to log factors, but no extra factors such as log( Y) not present in the realizable-case sample complexity). They also prove a tighter lower bound than previously known for the sample complexity of uniform convergence for multiclass classification in Theorem 3.6. The techniques used in the proofs are mostly straightforward or have appeared in other related contexts previously.