AITopics | Sashank Reddi

Multilabel reductions: what is my loss optimising?

Aditya K. Menon, Ankit Singh Rawat, Sashank Reddi, Sanjiv Kumar

Neural Information Processing SystemsMar-27-2025, 00:48:34 GMT

Multilabel classification is a challenging problem arising in applications ranging from information retrieval to image tagging. A popular approach to this problem is to employ a reduction to a suitable series of binary or multiclass problems (e.g., computing a softmax based cross-entropy over the relevant labels). While such methods have seen empirical success, less is understood about how well they approximate two fundamental performance measures: precision@k and recall@k. In this paper, we study five commonly used reductions, including the one-versus-all reduction, a reduction to multiclass classification, and normalised versions of the same, wherein the contribution of each instance is normalised by the number of relevant labels. Our main result is a formal justification of each reduction: we explicate their underlying risks, and show they are each consistent with respect to either precision or recall. Further, we show that in general no reduction can be optimal for both measures. We empirically validate our results, demonstrating scenarios where normalised reductions yield recall gains over unnormalised counterparts.

artificial intelligence, machine learning, reduction, (16 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
North America > United States (0.48)

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

Adaptive Methods for Nonconvex Optimization

Manzil Zaheer, Sashank Reddi, Devendra Sachan, Satyen Kale, Sanjiv Kumar

Neural Information Processing SystemsMar-26-2025, 13:58:17 GMT

However, it has been recently demonstrated that such methods can fail to converge even in simple convex optimization settings. In this work, we provide a new analysis of such methods applied to nonconvex stochastic optimization problems, characterizing the effect of increasing minibatch size. Our analysis shows that under this scenario such methods do converge to stationarity up to the statistical limit of variance in the stochastic gradients (scaled by a constant factor). In particular, our result implies that increasing minibatch sizes enables convergence, thus providing a way to circumvent the nonconvergence issues.

experiment, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.68)

Genre: Research Report > New Finding (0.66)

Industry: Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)

Add feedback

Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces

Chuan Guo, Ali Mousavi, Xiang Wu, Daniel N. Holtmann-Rice, Satyen Kale, Sashank Reddi, Sanjiv Kumar

Neural Information Processing SystemsMar-25-2025, 15:55:59 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, regularizer, (20 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
North America > United States > New York (0.28)
North America > Canada > British Columbia (0.28)

Industry: Law > Civil Rights & Constitutional Law (0.40)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Multilabel reductions: what is my loss optimising?

Aditya K. Menon, Ankit Singh Rawat, Sashank Reddi, Sanjiv Kumar

Neural Information Processing SystemsJan-27-2025, 11:30:15 GMT

Multilabel classification is a challenging problem arising in applications ranging from information retrieval to image tagging. A popular approach to this problem is to employ a reduction to a suitable series of binary or multiclass problems (e.g., computing a softmax based cross-entropy over the relevant labels). While such methods have seen empirical success, less is understood about how well they approximate two fundamental performance measures: precision@k and recall@k. In this paper, we study five commonly used reductions, including the one-versus-all reduction, a reduction to multiclass classification, and normalised versions of the same, wherein the contribution of each instance is normalised by the number of relevant labels. Our main result is a formal justification of each reduction: we explicate their underlying risks, and show they are each consistent with respect to either precision or recall. Further, we show that in general no reduction can be optimal for both measures. We empirically validate our results, demonstrating scenarios where normalised reductions yield recall gains over unnormalised counterparts.

artificial intelligence, machine learning, reduction, (16 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
North America > United States > New York (0.15)

Genre: Research Report > New Finding (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.35)

Add feedback

Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces

Chuan Guo, Ali Mousavi, Xiang Wu, Daniel N. Holtmann-Rice, Satyen Kale, Sashank Reddi, Sanjiv Kumar

Neural Information Processing SystemsJan-24-2025, 22:45:40 GMT

In extreme classification settings, embedding-based neural network models are currently not competitive with sparse linear and tree-based methods in terms of accuracy. Most prior works attribute this poor performance to the low-dimensional bottleneck in embedding-based methods. In this paper, we demonstrate that theoretically there is no limitation to using low-dimensional embedding-based methods, and provide experimental evidence that overfitting is the root cause of the poor performance of embedding-based methods. These findings motivate us to investigate novel data augmentation and regularization techniques to mitigate overfitting. To this end, we propose GLaS, a new regularizer for embedding-based neural network approaches. It is a natural generalization from the graph Laplacian and spread-out regularizers, and empirically it addresses the drawback of each regularizer alone when applied to the extreme classification setup. With the proposed techniques, we attain or improve upon the state-of-the-art on most widely tested public extreme classification datasets with hundreds of thousands of labels.

artificial intelligence, machine learning, regularizer, (20 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
North America > United States > New York (0.28)
North America > Canada > British Columbia (0.28)

Industry: Law > Civil Rights & Constitutional Law (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Adaptive Methods for Nonconvex Optimization

Manzil Zaheer, Sashank Reddi, Devendra Sachan, Satyen Kale, Sanjiv Kumar

Neural Information Processing SystemsOct-7-2024, 18:49:13 GMT

However, it has been recently demonstrated that such methods can fail to converge even in simple convex optimization settings. In this work, we provide a new analysis of such methods applied to nonconvex stochastic optimization problems, characterizing the effect of increasing minibatch size. Our analysis shows that under this scenario such methods do converge to stationarity up to the statistical limit of variance in the stochastic gradients (scaled by a constant factor). In particular, our result implies that increasing minibatch sizes enables convergence, thus providing a way to circumvent the nonconvergence issues.

experiment, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.68)

Genre: Research Report > New Finding (0.66)

Industry: Education (0.93)

Technology: