AITopics | unbalanced data

Collaborating Authors

unbalanced data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Joint Velocity-Growth Flow Matching for Single-Cell Dynamics Modeling

Neural Information Processing SystemsJun-14-2026, 06:52:36 GMT

Learning the underlying dynamics of single cells from snapshot data has gained increasing attention in scientific and machine learning research.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.57)

Add feedback

A Set of Rules for Model Validation

Camacho, José

arXiv.org Machine LearningNov-27-2025

The validation of a data-driven model is the process of asses sing the model's ability to generalize to new, unseen data in the population o f interest. This paper proposes a set of general rules for model validation. T hese rules are designed to help practitioners create reliable validation plans and report their results transparently. While no validation scheme is flawle ss, these rules can help practitioners ensure their strategy is sufficient for pr actical use, openly discuss any limitations of their validation strategy, and r eport clear, comparable performance metrics. Keywords: Validation, Cross-validation 1. Introduction Model validation is a fundamental task in all modern data-dr iven systems, whether they fall under the broad categories of Statistics, Machine Learning (ML), Artificial Intelligence (AI), or more specialized fiel ds like chemometrics. Validation has become a major focus for regulatory and stand ardization bodies, with key reports and standards highlighting the growing con cern for ensuring the trustworthiness and reliability of data-driven models: NIST AI Risk Management Framework (AI RMF 1.0, 2023): Publi shed by the U.S. Department of Commerce, this framework provides management techniques to address the risks and ensure the trustwor thiness of AI systems, with validation as a core component. The EU AI Act of 2024, landmark piece of EU legislation that c ategorizes AI systems by risk level, where validation is not defined as a b est practice but a legal requirement within the conformity assessment. The ISO/IEC TS 4213:2022, by the International Organizati on for Standardization (ISO), describes approaches and methods to ens ure the rele-Email address: josecamacho@ugr.es The IEEE P2841 -2022 is a recommended practice for the fram ework and process for deep learning evaluation.

application, selection, validation, (17 more...)

arXiv.org Machine Learning

2511.20711

Country:

North America > United States (0.68)
Europe > Spain > Andalusia > Granada Province > Granada (0.04)

Genre: Research Report > Experimental Study (0.46)

Industry:

Law (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.93)
Government > Regional Government > North America Government > United States Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

29405e2a4c22866a205f557559c7fa4b-Supplemental.pdf

Neural Information Processing SystemsOct-2-2025, 12:59:23 GMT

artificial intelligence, machine learning, rumi-lsgan, (16 more...)

Neural Information Processing Systems

Country: Asia > India (0.15)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Structured Reasoning Framework for Unbalanced Data Classification Using Probabilistic Models

Du, Junliang, Dou, Shiyu, Yang, Bohuan, Hu, Jiacheng, An, Tai

arXiv.org Artificial IntelligenceFeb-5-2025

This paper studies a Markov network model for unbalanced data, aiming to solve the problems of classification bias and insufficient minority class recognition ability of traditional machine learning models in environments with uneven class distribution. By constructing joint probability distribution and conditional dependency, the model can achieve global modeling and reasoning optimization of sample categories. The study introduced marginal probability estimation and weighted loss optimization strategies, combined with regularization constraints and structured reasoning methods, effectively improving the generalization ability and robustness of the model. In the experimental stage, a real credit card fraud detection dataset was selected and compared with models such as logistic regression, support vector machine, random forest and XGBoost. The experimental results show that the Markov network performs well in indicators such as weighted accuracy, F1 score, and AUC-ROC, significantly outperforming traditional classification models, demonstrating its strong decision-making ability and applicability in unbalanced data scenarios. Future research can focus on efficient model training, structural optimization, and deep learning integration in large-scale unbalanced data environments and promote its wide application in practical applications such as financial risk control, medical diagnosis, and intelligent monitoring.

artificial intelligence, machine learning, markov network, (17 more...)

arXiv.org Artificial Intelligence

2502.03386

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Asia > China > Shanghai > Shanghai (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > Switzerland (0.04)

Genre: Research Report > New Finding (0.69)

Industry:

Banking & Finance (1.00)
Health & Medicine (0.89)
Law Enforcement & Public Safety > Fraud (0.69)
Information Technology > Security & Privacy (0.48)

Add feedback

DeepLearningBrasil@LT-EDI-2023: Exploring Deep Learning Techniques for Detecting Depression in Social Media Text

Garcia, Eduardo, Gomes, Juliana, Júnior, Adalberto Barbosa, Borges, Cardeque, da Silva, Nádia

arXiv.org Artificial IntelligenceNov-8-2023

In this paper, we delineate the strategy employed by our team, DeepLearningBrasil, which secured us the first place in the shared task DepSign-LT-EDI@RANLP-2023, achieving a 47.0% Macro F1-Score and a notable 2.4% advantage. The task was to classify social media texts into three distinct levels of depression - "not depressed," "moderately depressed," and "severely depressed." Leveraging the power of the RoBERTa and DeBERTa models, we further pre-trained them on a collected Reddit dataset, specifically curated from mental health-related Reddit's communities (Subreddits), leading to an enhanced understanding of nuanced mental health discourse. To address lengthy textual data, we used truncation techniques that retained the essence of the content by focusing on its beginnings and endings. Our model was robust against unbalanced data by incorporating sample weights into the loss. Cross-validation and ensemble techniques were then employed to combine our k-fold trained models, delivering an optimal solution. The accompanying code is made available for transparency and further development.

dataset, depression, depsign-lt-edi, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.26615/978-954-452-084-7_042

2311.05047

Country:

Europe > Ireland > Leinster > County Dublin > Dublin (0.05)
South America > Brazil > Goiás (0.04)
Europe > Bulgaria > Varna Province > Varna (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Huber Loss Minimization Approach to Byzantine Robust Federated Learning

Zhao, Puning, Yu, Fei, Wan, Zhiguo

arXiv.org Artificial IntelligenceAug-24-2023

Federated learning systems are susceptible to adversarial attacks. To combat this, we introduce a novel aggregator based on Huber loss minimization, and provide a comprehensive theoretical analysis. Under independent and identically distributed (i.i.d) assumption, our approach has several advantages compared to existing methods. Firstly, it has optimal dependence on $\epsilon$, which stands for the ratio of attacked clients. Secondly, our approach does not need precise knowledge of $\epsilon$. Thirdly, it allows different clients to have unequal data sizes. We then broaden our analysis to include non-i.i.d data, such that clients have slightly different distributions.

artificial intelligence, assumption, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2308.12581

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Credit card fraud detection - Classifier selection strategy

Kulatilleke, Gayan K.

arXiv.org Artificial IntelligenceAug-25-2022

Machine learning has opened up new tools for financial fraud detection. Using a sample of annotated transactions, a machine learning classification algorithm learns to detect frauds. With growing credit card transaction volumes and rising fraud percentages there is growing interest in finding appropriate machine learning classifiers for detection. However, fraud data sets are diverse and exhibit inconsistent characteristics. As a result, a model effective on a given data set is not guaranteed to perform on another. Further, the possibility of temporal drift in data patterns and characteristics over time is high. Additionally, fraud data has massive and varying imbalance. In this work, we evaluate sampling methods as a viable pre-processing mechanism to handle imbalance and propose a data-driven classifier selection strategy for characteristic highly imbalanced fraud detection data sets. The model derived based on our selection strategy surpasses peer models, whilst working in more realistic conditions, establishing the effectiveness of the strategy.

card fraud detection, credit card fraud detection, fraud detection, (12 more...)

arXiv.org Artificial Intelligence

2208.119

Country:

North America > United States > California > Orange County > Irvine (0.04)
Asia > Taiwan (0.04)
Asia > India (0.04)

Genre: Research Report (0.64)

Industry: Law Enforcement & Public Safety > Fraud (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.68)

Add feedback

Using Mixed-Effects Models to Learn Bayesian Networks from Related Data Sets

Scutari, Marco, Marquis, Christopher, Azzimonti, Laura

arXiv.org Artificial IntelligenceAug-4-2022

We commonly assume that data are a homogeneous set of observations when learning the structure of Bayesian networks. However, they often comprise different data sets that are related but not homogeneous because they have been collected in different ways or from different populations. In our previous work (Azzimonti, Corani and Scutari, 2021), we proposed a closed-form Bayesian Hierarchical Dirichlet score for discrete data that pools information across related data sets to learn a single encompassing network structure, while taking into account the differences in their probabilistic structures. In this paper, we provide an analogous solution for learning a Bayesian network from continuous data using mixed-effects models to pool information across the related data sets. We study its structural, parametric, predictive and classification accuracy and we show that it outperforms both conditional Gaussian Bayesian networks (that do not perform any pooling) and classical Gaussian Bayesian networks (that disregard the heterogeneous nature of the data). The improvement is marked for low sample sizes and for unbalanced data sets.

artificial intelligence, cgbn, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2206.03743

Country: Europe > Switzerland > Vaud > Lausanne (0.04)

Genre: Research Report > Experimental Study (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Imbalanced Data? Stop Using ROC-AUC and Use AUPRC Instead

#artificialintelligenceJun-7-2022, 07:22:07 GMT

The Receiver Operating Characteristic -- Area Under the Curve (ROC-AUC) measure is widely used to assess the performance of binary classifiers. However, sometimes, it is more appropriate to evaluate your classifier based on measuring the Area Under the Precision-Recall Curve (AUPRC). We will present a detailed comparison between these two measures, accompanied by empirical results and graphical illustrations. Scikit-learn experiments are also available in a corresponding notebook. I'll assume you're familiar with precision and recall and the elements of the confusion matrix (TP, FN, FP, TN).

classifier, prediction, probability, (16 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback