AITopics | imbalanced

Collaborating Authors

imbalanced

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Optimized Deferral for Imbalanced Settings

Cortes, Corinna, Mao, Anqi, Mohri, Mehryar, Zhong, Yutao

arXiv.org Machine LearningMay-1-2026

Learning algorithms can be significantly improved by routing complex or uncertain inputs to specialized experts, balancing accuracy with computational cost. This approach, known as learning to defer, is essential in domains like natural language generation, medical diagnosis, and computer vision, where an effective deferral can reduce errors at low extra resource consumption. However, the two-stage learning to defer setting, which leverages existing predictors such as a collection of LLMs or other classifiers, often faces challenges due to an expert imbalance problem. This imbalance can lead to suboptimal performance, with deferral algorithms favoring the majority expert. We present a comprehensive study of two-stage learning to defer in expert imbalance settings. We cast the deferral loss optimization as a novel cost-sensitive learning problem over the input-expert domain. We derive new margin-based loss functions and guarantees tailored to this setting, and develop novel algorithms for cost-sensitive learning. Leveraging these results, we design principled deferral algorithms, MILD (Margin-based Imbalanced Learning to Defer), specifically suited for expert imbalance settings. Extensive experiments demonstrate the effectiveness of our approach, showing clear improvements over existing baselines on both image classification and real-world Large Language Model (LLM) routing tasks.

large language model, machine learning, natural language, (16 more...)

arXiv.org Machine Learning

2604.27723

Country: North America (0.46)

Genre: Research Report (0.40)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

TurningtheTables: Biased,Imbalanced,Dynamic TabularDatasetsforMLEvaluation

Neural Information Processing SystemsFeb-12-2026, 05:52:53 GMT

Inrecentyears, there has been a significant increase of publicly available unstructured data resources for computer vision and NLP tasks.

artificial intelligence, dataset, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Europe > France (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)
(3 more...)

Industry:

Information Technology > Security & Privacy (1.00)
Banking & Finance (0.94)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Supplementary Distrib for Imbalanced

Neural Information Processing SystemsFeb-9-2026, 17:15:39 GMT

Tothis coordinate (1), and necessaryand solutionof (1). We usethemodeltrainedusing MixMatch [5] under 3 cases: (1) l = 100, u =1 , (2) = l = u = 100(reverse) and (3) = 100.

artificial intelligence, machine learning, supplementary distrib, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.72)

Add feedback

Turning the Tables: Biased, Imbalanced, Dynamic Tabular Datasets for ML Evaluation

Neural Information Processing SystemsDec-25-2025, 10:52:17 GMT

Evaluating new techniques on realistic datasets plays a crucial role in the development of ML research and its broader adoption by practitioners. In recent years, there has been a significant increase of publicly available unstructured data resources for computer vision and NLP tasks. However, tabular data -- which is prevalent in many high-stakes domains -- has been lagging behind. To bridge this gap, we present Bank Account Fraud (BAF), the first publicly available 1 privacy-preserving, large-scale, realistic suite of tabular datasets. The suite was generated by applying state-of-the-art tabular data generation techniques on an anonymized,real-world bank account opening fraud detection dataset. This setting carries a set of challenges that are commonplace in real-world applications, including temporal dynamics and significant class imbalance. Additionally, to allow practitioners to stress test both performance and fairness of ML methods, each dataset variant of BAF contains specific types of data bias. With this resource, we aim to provide the research community with a more realistic, complete, and robust test bed to evaluate novel and existing methods.

dynamic tabular dataset, imbalanced, name change, (4 more...)

Neural Information Processing Systems

Industry: Law Enforcement & Public Safety > Fraud (0.60)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

CYCle: Choosing Your Collaborators Wisely to Enhance Collaborative Fairness in Decentralized Learning

Tastan, Nurbek, Horvath, Samuel, Nandakumar, Karthik

arXiv.org Artificial IntelligenceJan-21-2025

Collaborative learning (CL) enables multiple participants to jointly train machine learning (ML) models on decentralized data sources without raw data sharing. While the primary goal of CL is to maximize the expected accuracy gain for each participant, it is also important to ensure that the gains are fairly distributed. Specifically, no client should be negatively impacted by the collaboration, and the individual gains must ideally be commensurate with the contributions. Most existing CL algorithms require central coordination and focus on the gain maximization objective while ignoring collaborative fairness. In this work, we first show that the existing measure of collaborative fairness based on the correlation between accuracy values without and with collaboration has drawbacks because it does not account for negative collaboration gain. We argue that maximizing mean collaboration gain (MCG) while simultaneously minimizing the collaboration gain spread (CGS) is a fairer alternative. Next, we propose the CYCle protocol that enables individual participants in a private decentralized learning (PDL) framework to achieve this objective through a novel reputation scoring method based on gradient alignment between the local cross-entropy and distillation losses. Experiments on the CIFAR-10, CIFAR-100, and Fed-ISIC2019 datasets empirically demonstrate the effectiveness of the CYCle protocol to ensure positive and fair collaboration gain for all participants, even in cases where the data distributions of participants are highly skewed. For the simple mean estimation problem with two participants, we also theoretically show that CYCle performs better than standard FedAvg, especially when there is large statistical heterogeneity.

artificial intelligence, machine learning, participant, (15 more...)

arXiv.org Artificial Intelligence

2501.12344

Country:

North America > United States > Virginia (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > France (0.04)
Asia > Singapore (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Regional Government (0.67)
Law > Statutes (0.67)
(2 more...)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Security & Privacy (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Turning the Tables: Biased, Imbalanced, Dynamic Tabular Datasets for ML Evaluation

Neural Information Processing SystemsJan-19-2025, 01:05:10 GMT

dynamic tabular dataset, imbalanced, ml evaluation, (2 more...)

Neural Information Processing Systems

Industry: Law Enforcement & Public Safety > Fraud (0.63)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

Offline Clustering Approach to Self-supervised Learning for Class-imbalanced Image Data

Chang, Hye-min, Chang, Sungkyun

arXiv.org Artificial IntelligenceDec-21-2022

Class-imbalanced datasets are known to cause the problem of model being biased towards the majority classes. In this project, we set up two research questions: 1) when is the class-imbalance problem more prevalent in self-supervised pre-training? and 2) can offline clustering of feature representations help pre-training on class-imbalanced data? Our experiments investigate the former question by adjusting the degree of {\it class-imbalance} when training the baseline models, namely SimCLR and SimSiam on CIFAR-10 database. To answer the latter question, we train each expert model on each subset of the feature clusters. We then distill the knowledge of expert models into a single model, so that we will be able to compare the performance of this model to our baselines.

artificial intelligence, machine learning, simsiam, (17 more...)

arXiv.org Artificial Intelligence

2212.11444

Country: North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report > Experimental Study (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Your Dataset Is Imbalanced? Do Nothing!

#artificialintelligenceAug-24-2022, 15:15:59 GMT

"We have a problem: this dataset is imbalanced." After telling this, they usually start proposing some balancing techniques to "fix" the dataset, such as undersampling, oversampling, SMOTE, and whatnot. I think this is one of the most widespread misconceptions in the machine learning community. Class imbalance is not a problem! In this article, I will explain to you why.

class imbalance, dataset, imbalance, (12 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback