On the benefits of output sparsity for multi-label classification