AITopics | long-tailed distribution

Collaborating Authors

long-tailed distribution

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Simultaneous Long-tailed Recognition and Multi-modal Fusion for Highly Imbalanced Multi-modal Data

Yoon, Heegeon, Kim, Heeyoung

arXiv.org Machine LearningMay-12-2026

As datasets continue to expand in size and complexity, these models have become increasingly sophisticated, with deeper architectures and greater expressive power. Despite these advances, DNNs trained on imbalanced class distributions often exhibit a tendency to favor majority classes, leading to degraded performance on underrepresented classes [18, 39, 27, 17]. Because many real-world datasets follow long-tailed distributions in which minority classes can contain critical and informative patterns, developing methods that enable DNNs to learn effectively from imbalanced data is essential to prevent the loss of valuable information from these rare classes [26, 34, 16]. Moreover, data encountered in real-world applications are frequently multi-modal, meaning that observations originate from heterogeneous sources [6, 29, 7, 35]. To make effective use of such heterogeneous inputs, a wide range of multi-modal learning approaches have been proposed that exploit complementary information across modalities to enhance predictive performance [10, 5]. Common strategies integrate multiple modalities into a unified representation, using techniques that span from straightforward feature-level concatenation [19, 11, 12] to more sophisticated neural architectures that learn joint representations in an end-to-end manner [20, 32]. Although prior research has extensively studied class imbalance and multi-modal data separately, relatively little attentionhas beengiven to settings where bothchallenges arise si2 multaneously. Developing methods that can effectively handle long-tailed class distributions in conjunction with multi-modal inputs is therefore essential in many real-world applications. In the medical domain, for instance, datasets often contain far more samples from healthy individuals than from patients with specific conditions, while also encompassing diverse datatypes such asimagingdata(e.g., X-rays)alongsideauxiliary informationincluding demographics and clinical histories.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Machine Learning

2605.10498

Genre: Research Report (0.82)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

f5fcd88d3deb97bb62559208cfa0ab62-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 08:09:49 GMT

artificial intelligence, category, machine learning, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.96)

Add feedback

85cd8edc556709341b2ef6c4d5725545-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsFeb-16-2026, 08:15:15 GMT

data mining, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
North America > United States > Virginia (0.04)
North America > United States > Oregon (0.04)
North America > United States > Iowa > Story County > Ames (0.04)

Genre:

Research Report (0.67)
Overview (0.46)

Industry:

Health & Medicine (1.00)
Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(7 more...)

Add feedback

Fed-GraB: Federated Long-tailed Learning with Self-Adjusting Gradient Balancer

Neural Information Processing SystemsDec-27-2025, 05:29:14 GMT

Data privacy and long-tailed distribution are the norms rather than the exception in many real-world tasks. This paper investigates a federated long-tailed learning (Fed-LT) task in which each client holds a locally heterogeneous dataset; if the datasets can be globally aggregated, they jointly exhibit a long-tailed distribution. Under such a setting, existing federated optimization and/or centralized long-tailed learning methods hardly apply due to challenges in (a) characterizing the global long-tailed distribution under privacy constraints and (b) adjusting the local learning strategy to cope with the head-tail imbalance. In response, we propose a method termed $\texttt{Fed-GraB}$, comprised of a Self-adjusting Gradient Balancer (SGB) module that re-weights clients' gradients in a closed-loop manner, based on the feedback of global long-tailed distribution evaluated by a Direct Prior Analyzer (DPA) module. Using $\texttt{Fed-GraB}$, clients can effectively alleviate the distribution drift caused by data heterogeneity during the model training process and obtain a global model with better performance on the minority classes while maintaining the performance of the majority classes. Extensive experiments demonstrate that $\texttt{Fed-GraB}$ achieves state-of-the-art performance on representative datasets such as CIFAR-10-LT, CIFAR-100-LT, ImageNet-LT, and iNaturalist.

fed-grab, federated long-tailed learning, long-tailed distribution, (7 more...)

Neural Information Processing Systems

Industry: Information Technology > Security & Privacy (0.59)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.76)
Information Technology > Security & Privacy (0.59)

Add feedback

Combating Representation Learning Disparity with Geometric Harmonization

Neural Information Processing SystemsDec-24-2025, 20:54:19 GMT

Self-supervised learning (SSL) as an effective paradigm of representation learning has achieved tremendous success on various curated datasets in diverse scenarios. Nevertheless, when facing the long-tailed distribution in real-world applications, it is still hard for existing methods to capture transferable and robust representation. The attribution is that the vanilla SSL methods that pursue the sample-level uniformity easily leads to representation learning disparity, where head classes with the huge sample number dominate the feature regime but tail classes with the small sample number passively collapse. To address this problem, we propose a novel Geometric Harmonization (GH) method to encourage the category-level uniformity in representation learning, which is more benign to the minority and almost does not hurt the majority under long-tailed distribution. Specially, GH measures the population statistics of the embedding space on top of self-supervised learning, and then infer an fine-grained instance-wise calibration to constrain the space expansion of head classes and avoid the passive collapse of tail classes. Our proposal does not alter the setting of SSL and can be easily integrated into existing methods in a low-cost manner. Extensive results on a range of benchmark datasets show the effectiveness of \methodspace with high tolerance to the distribution skewness.

combating representation learning disparity, geometric harmonization, name change, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.63)

Add feedback

Keep It on a Leash: Controllable Pseudo-label Generation Towards Realistic Long-Tailed Semi-Supervised Learning

Hou, Yaxin, Han, Bo, Jia, Yuheng, Liu, Hui, Hou, Junhui

arXiv.org Artificial IntelligenceDec-12-2025

Current long-tailed semi-supervised learning methods assume that labeled data exhibit a long-tailed distribution, and unlabeled data adhere to a typical predefined distribution (i.e., long-tailed, uniform, or inverse long-tailed). However, the distribution of the unlabeled data is generally unknown and may follow an arbitrary distribution. To tackle this challenge, we propose a Controllable Pseudo-label Generation (CPG) framework, expanding the labeled dataset with the progressively identified reliable pseudo-labels from the unlabeled dataset and training the model on the updated labeled dataset with a known distribution, making it unaffected by the unlabeled data distribution. Specifically, CPG operates through a controllable self-reinforcing optimization cycle: (i) at each training step, our dynamic controllable filtering mechanism selectively incorporates reliable pseudo-labels from the unlabeled dataset into the labeled dataset, ensuring that the updated labeled dataset follows a known distribution; (ii) we then construct a Bayes-optimal classifier using logit adjustment based on the updated labeled data distribution; (iii) this improved classifier subsequently helps identify more reliable pseudo-labels in the next training step. We further theoretically prove that this optimization cycle can significantly reduce the generalization error under some conditions. Additionally, we propose a class-aware adaptive augmentation module to further improve the representation of minority classes, and an auxiliary branch to maximize data utilization by leveraging all labeled and unlabeled samples. Comprehensive evaluations on various commonly used benchmark datasets show that CPG achieves consistent improvements, surpassing state-of-the-art methods by up to $\textbf{15.97%}$ in accuracy. The code is available at https://github.com/yaxinhou/CPG.

artificial intelligence, inductive learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2510.03993

Country: Asia > China (0.28)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Unlabeled Data vs. Pre-trained Knowledge: Rethinking SSL in the Era of Large Models

Lv, Song-Lin, Zhu, Rui, Wei, Tong, Li, Yu-Feng, Guo, Lan-Zhe

arXiv.org Artificial IntelligenceOct-28-2025

Semi-supervised learning (SSL) alleviates the cost of data labeling process by exploiting unlabeled data and has achieved promising results. Meanwhile, with the development of large foundation models, exploiting pre-trained models becomes a promising way to address the label scarcity in the downstream tasks, such as various parameter-efficient fine-tuning techniques. This raises a natural yet critical question: When labeled data is limited, should we rely on unlabeled data or pre-trained models? To investigate this issue, we conduct a fair comparison between SSL methods and pre-trained models (e.g., CLIP) on representative image classification tasks under a controlled supervision budget. Experiments reveal that SSL has met its ``Waterloo" in the era of large models, as pre-trained models show both high efficiency and strong performance on widely adopted SSL benchmarks. This underscores the urgent need for SSL researchers to explore new avenues, such as deeper integration between the SSL and pre-trained models. Furthermore, we investigate the potential of Multi-Modal Large Language Models (MLLMs) in image classification tasks. Results show that, despite their massive parameter scales, MLLMs still face significant performance limitations, highlighting that even a seemingly well-studied task remains highly challenging.

artificial intelligence, machine learning, pre-trained model, (15 more...)

arXiv.org Artificial Intelligence

2505.13317

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)

Add feedback