AITopics | embedding-based classifier

Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces

Neural Information Processing SystemsDec-25-2025, 14:47:29 GMT

In extreme classification settings, embedding-based neural network models are currently not competitive with sparse linear and tree-based methods in terms of accuracy. Most prior works attribute this poor performance to the low-dimensional bottleneck in embedding-based methods. In this paper, we demonstrate that theoretically there is no limitation to using low-dimensional embedding-based methods, and provide experimental evidence that overfitting is the root cause of the poor performance of embedding-based methods. These findings motivate us to investigate novel data augmentation and regularization techniques to mitigate overfitting. To this end, we propose GLaS, a new regularizer for embedding-based neural network approaches. It is a natural generalization from the graph Laplacian and spread-out regularizers, and empirically it addresses the drawback of each regularizer alone when applied to the extreme classification setup. With the proposed techniques, we attain or improve upon the state-of-the-art on most widely tested public extreme classification datasets with hundreds of thousands of labels.

embedding-based classifier, glass ceiling, name change, (6 more...)

Neural Information Processing Systems

Industry: Law > Civil Rights & Constitutional Law (0.43)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.84)

Add feedback

AI-Powered Early Diagnosis of Mental Health Disorders from Real-World Clinical Conversations

Zhu, Jianfeng, Maharjan, Julina, Li, Xinyu, Coifman, Karin G., Jin, Ruoming

arXiv.org Artificial IntelligenceOct-17-2025

Mental health disorders remain among the leading cause of disability worldwide, yet conditions such as depression, anxiety, and Post-Traumatic Stress Disorder (PTSD) are frequently underdiagnosed or misdiagnosed due to subjective assessments, limited clinical resources, and stigma and low awareness. In primary care settings, studies show that providers misidentify depression or anxiety in over 60% of cases, highlighting the urgent need for scalable, accessible, and context-aware diagnostic tools that can support early detection and intervention. In this study, we evaluate the effectiveness of machine learning models for mental health screening using a unique dataset of 553 real-world, semistructured interviews, each paried with ground-truth diagnoses for major depressive episodes (MDE), anxiety disorders, and PTSD. We benchmark multiple model classes, including zero-shot prompting with GPT-4.1 Mini and MetaLLaMA, as well as fine-tuned RoBERTa models using LowRank Adaptation (LoRA). Our models achieve over 80% accuracy across diagnostic categories, with especially strongperformance on PTSD (up to 89% accuracy and 98% recall). We also find that using shorter context, focused context segments improves recall, suggesting that focused narrative cues enhance detection sensitivity. LoRA fine-tuning proves both efficient and effective, with lower-rank configurations (e.g., rank 8 and 16) maintaining competitive performance across evaluation metrics. Our results demonstrate that LLM-based models can offer substantial improvements over traditional self-report screening tools, providing a path toward low-barrier, AI-powerd early diagnosis. This work lays the groundwork for integrating machine learning into real-world clinical workflows, particularly in low-resource or high-stigma environments where access to timely mental health care is most limited.

disorder, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2510.14937

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces

Neural Information Processing SystemsMay-27-2025, 12:26:39 GMT

In extreme classification settings, embedding-based neural network models are currently not competitive with sparse linear and tree-based methods in terms of accuracy. Most prior works attribute this poor performance to the low-dimensional bottleneck in embedding-based methods. In this paper, we demonstrate that theoretically there is no limitation to using low-dimensional embedding-based methods, and provide experimental evidence that overfitting is the root cause of the poor performance of embedding-based methods. These findings motivate us to investigate novel data augmentation and regularization techniques to mitigate overfitting. To this end, we propose GLaS, a new regularizer for embedding-based neural network approaches.

embedding-based classifier, embedding-based method, glass ceiling, (3 more...)

Neural Information Processing Systems

Industry: Law > Civil Rights & Constitutional Law (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)

Add feedback

Reviews: Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces

Neural Information Processing SystemsJan-24-2025, 22:45:41 GMT

In the prior literature, they cited the low dimensional embedding methods is the reason of the poor performance of the embedding based methods. In this paper, the author proposed that the final score vector for the labels actually generated by highly non-linear transformation such as thresholding the scores. Thus it is not clear if the low-rank structure of the score vectors directly cause the low-rank on the label vectors. Furthermore, the author uses a simple neural network to mimic the low-dimensional embedding can attain near-perfect training accuracy but generalize poorly and suggesting that overfitting is the root cause of the poor performance of the embedding based methods. This is the first contribution of the paper which breaks the glass ceiling of embedding based methods.

laplacian regularizer, regularizer, spread-out regularizer, (12 more...)

Neural Information Processing Systems

Industry: Law > Civil Rights & Constitutional Law (0.62)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Reviews: Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces

Neural Information Processing SystemsJan-24-2025, 22:45:31 GMT

There is some disagreement about the significance of the paper among the reviewers. Three steps can be distinguished. First, to refute the common belief that low-dimensional embeddings act as bottlenecks that limit the accuracy in the extreme classification case. Here, while it is true (raised by reviewer 1) that a representation result does not imply computational achievability, I feel that it reverses the direction of justification. If someone could show that common optimization methods fail to find embeddings (which "exist"), then this would re-instantiate the argument, yet in a more refined/precise form.

embedding-based classifier, glass ceiling, output space, (2 more...)

Neural Information Processing Systems

Industry: Law > Civil Rights & Constitutional Law (0.40)

Technology: Information Technology > Artificial Intelligence (0.49)

Add feedback

Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces

Neural Information Processing SystemsOct-10-2024, 08:25:28 GMT

In extreme classification settings, embedding-based neural network models are currently not competitive with sparse linear and tree-based methods in terms of accuracy. Most prior works attribute this poor performance to the low-dimensional bottleneck in embedding-based methods. In this paper, we demonstrate that theoretically there is no limitation to using low-dimensional embedding-based methods, and provide experimental evidence that overfitting is the root cause of the poor performance of embedding-based methods. These findings motivate us to investigate novel data augmentation and regularization techniques to mitigate overfitting. To this end, we propose GLaS, a new regularizer for embedding-based neural network approaches.

embedding-based classifier, embedding-based method, glass ceiling, (3 more...)

Neural Information Processing Systems

Industry: Law > Civil Rights & Constitutional Law (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)

Add feedback

Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces

Guo, Chuan, Mousavi, Ali, Wu, Xiang, Holtmann-Rice, Daniel N., Kale, Satyen, Reddi, Sashank, Kumar, Sanjiv

Neural Information Processing SystemsMar-18-2020, 22:31:44 GMT

In extreme classification settings, embedding-based neural network models are currently not competitive with sparse linear and tree-based methods in terms of accuracy. Most prior works attribute this poor performance to the low-dimensional bottleneck in embedding-based methods. In this paper, we demonstrate that theoretically there is no limitation to using low-dimensional embedding-based methods, and provide experimental evidence that overfitting is the root cause of the poor performance of embedding-based methods. These findings motivate us to investigate novel data augmentation and regularization techniques to mitigate overfitting. To this end, we propose GLaS, a new regularizer for embedding-based neural network approaches.

embedding-based classifier, embedding-based method, glass ceiling, (3 more...)

Neural Information Processing Systems

Industry: Law > Civil Rights & Constitutional Law (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.90)

Add feedback

Filters

Collaborating Authors

embedding-based classifier

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces

AI-Powered Early Diagnosis of Mental Health Disorders from Real-World Clinical Conversations

Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces

Reviews: Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces

Reviews: Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces

Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces

Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces