AITopics | unbalanced dataset

Collaborating Authors

unbalanced dataset

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

29405e2a4c22866a205f557559c7fa4b-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-7-2026, 21:44:39 GMT

application, final paper, formulation, (16 more...)

Neural Information Processing Systems

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence (0.50)
Information Technology > Sensing and Signal Processing > Image Processing (0.31)

Add feedback

Assessing reliability of explanations in unbalanced datasets: a use-case on the occurrence of frost events

Vascotto, Ilaria, Blasone, Valentina, Rodriguez, Alex, Bonaita, Alessandro, Bortolussi, Luca

arXiv.org Artificial IntelligenceOct-14-2025

The usage of eXplainable Artificial Intelligence (XAI) methods has become essential in practical applications, given the increasing deployment of Artificial Intelligence (AI) models and the legislative requirements put forward in the latest years. A fundamental but often underestimated aspect of the explanations is their robustness, a key property that should be satisfied in order to trust the explanations. In this study, we provide some preliminary insights on evaluating the reliability of explanations in the specific case of unbalanced datasets, which are very frequent in high-risk use-cases, but at the same time considerably challenging for both AI models and XAI methods. We propose a simple evaluation focused on the minority class (i.e. the less frequent one) that leverages on-manifold generation of neighbours, explanation aggregation and a metric to test explanation consistency. We present a use-case based on a tabular dataset with numerical features focusing on the occurrence of frost events.

artificial intelligence, explanation, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2507.09545

Country:

North America > United States (0.46)
Europe > Italy (0.29)

Genre: Research Report (0.70)

Industry: Information Technology > Security & Privacy (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

29405e2a4c22866a205f557559c7fa4b-AuthorFeedback.pdf

Neural Information Processing SystemsOct-2-2025, 12:59:01 GMT

We thank the reviewers for their valuable feedback. We will add a few sentences in the Introduction of the final paper to further emphasize this point. A comparative assessment will be included in the final paper. R1.4 Typo errors in the caption of Figure 1: We have rectified the typographical error. Unlike discriminative learning approaches, the emphasis here is on the generative aspect (cf. Also refer to R1.1 and R1.5 above.

application, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.36)
Information Technology > Sensing and Signal Processing > Image Processing (0.31)

Add feedback

8cbe9ce23f42628c98f80fa0fac8b19a-Supplemental.pdf

Neural Information Processing SystemsAug-22-2025, 00:39:24 GMT

artificial intelligence, machine learning, target label, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Recovering Fairness Directly from Modularity: a New Way for Fair Community Partitioning

Wang, Yufeng, Bai, Yiguang, Zhu, Tianqing, Ayed, Ismail Ben, Yuan, Jing

arXiv.org Artificial IntelligenceMay-30-2025

Community partitioning is crucial in network analysis, with modularity optimization being the prevailing technique. However, traditional modularity-based methods often overlook fairness, a critical aspect in real-world applications. To address this, we introduce protected group networks and propose a novel fairness-modularity metric. This metric extends traditional modularity by explicitly incorporating fairness, and we prove that minimizing it yields naturally fair partitions for protected groups while maintaining theoretical soundness. We develop a general optimization framework for fairness partitioning and design the efficient Fair Fast Newman (FairFN) algorithm, enhancing the Fast Newman (FN) method to optimize both modularity and fairness. Experiments show FairFN achieves significantly improved fairness and high-quality partitions compared to state-of-the-art methods, especially on unbalanced datasets.

algorithm, artificial intelligence, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2505.22684

Country:

Asia > China (0.15)
North America > United States (0.15)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)

Add feedback

DP-TabICL: In-Context Learning with Differentially Private Tabular Data

Carey, Alycia N., Bhaila, Karuna, Edemacu, Kennedy, Wu, Xintao

arXiv.org Artificial IntelligenceMar-8-2024

In-context learning (ICL) enables large language models (LLMs) to adapt to new tasks by conditioning on demonstrations of question-answer pairs and it has been shown to have comparable performance to costly model retraining and fine-tuning. Recently, ICL has been extended to allow tabular data to be used as demonstration examples by serializing individual records into natural language formats. However, it has been shown that LLMs can leak information contained in prompts, and since tabular data often contain sensitive information, understanding how to protect the underlying tabular data used in ICL is a critical area of research. This work serves as an initial investigation into how to use differential privacy (DP) -- the long-established gold standard for data privacy and anonymization -- to protect tabular data used in ICL. Specifically, we investigate the application of DP mechanisms for private tabular ICL via data privatization prior to serialization and prompting. We formulate two private ICL frameworks with provable privacy guarantees in both the local (LDP-TabICL) and global (GDP-TabICL) DP scenarios via injecting noise into individual records or group statistics, respectively. We evaluate our DP-based frameworks on eight real-world tabular datasets and across multiple ICL and DP settings. Our evaluations show that DP-based ICL can protect the privacy of the underlying tabular data while achieving comparable performance to non-LLM baselines, especially under high privacy regimes.

dataset, demonstration example, tabular data, (17 more...)

arXiv.org Artificial Intelligence

2403.05681

Country:

North America > United States > Arkansas (0.04)
North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report > New Finding (0.67)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Endocrinology (0.48)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.47)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Performance Analysis of Support Vector Machine (SVM) on Challenging Datasets for Forest Fire Detection

Kar, Ankan, Nath, Nirjhar, Kemprai, Utpalraj, Aman, null

arXiv.org Artificial IntelligenceJan-23-2024

This article delves into the analysis of performance and utilization of Support Vector Machines (SVMs) for the critical task of forest fire detection using image datasets. With the increasing threat of forest fires to ecosystems and human settlements, the need for rapid and accurate detection systems is of utmost importance. SVMs, renowned for their strong classification capabilities, exhibit proficiency in recognizing patterns associated with fire within images. By training on labeled data, SVMs acquire the ability to identify distinctive attributes associated with fire, such as flames, smoke, or alterations in the visual characteristics of the forest area. The document thoroughly examines the use of SVMs, covering crucial elements like data preprocessing, feature extraction, and model training. It rigorously evaluates parameters such as accuracy, efficiency, and practical applicability. The knowledge gained from this study aids in the development of efficient forest fire detection systems, enabling prompt responses and improving disaster management. Moreover, the correlation between SVM accuracy and the difficulties presented by high-dimensional datasets is carefully investigated, demonstrated through a revealing case study. The relationship between accuracy scores and the different resolutions used for resizing the training datasets has also been discussed in this article. These comprehensive studies result in a definitive overview of the difficulties faced and the potential sectors requiring further improvement and focus.

dataset, support vector machine, svm, (12 more...)

arXiv.org Artificial Intelligence

2401.12924

Country: Asia > India > Tamil Nadu > Chennai (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Alleviating the Effect of Data Imbalance on Adversarial Training

Li, Guanlin, Xu, Guowen, Zhang, Tianwei

arXiv.org Artificial IntelligenceDec-4-2023

In this paper, we study adversarial training on datasets that obey the long-tailed distribution, which is practical but rarely explored in previous works. Compared with conventional adversarial training on balanced datasets, this process falls into the dilemma of generating uneven adversarial examples (AEs) and an unbalanced feature embedding space, causing the resulting model to exhibit low robustness and accuracy on tail data. To combat that, we theoretically analyze the lower bound of the robust risk to train a model on a long-tailed dataset to obtain the key challenges in addressing the aforementioned dilemmas. Based on it, we propose a new adversarial training framework -- Re-balancing Adversarial Training (REAT). This framework consists of two components: (1) a new training strategy inspired by the effective number to guide the model to generate more balanced and informative AEs; (2) a carefully constructed penalty function to force a satisfactory feature space. Evaluation results on different datasets and model structures prove that REAT can effectively enhance the model's robustness and preserve the model's clean accuracy. The code can be found in https://github.com/GuanlinLee/REAT.

adversarial training, dataset, tail class, (17 more...)

arXiv.org Artificial Intelligence

2307.10205

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Credit Card Fraud Detection

#artificialintelligenceNov-6-2021, 08:32:54 GMT

The dataset, available at Kaggle, is originated from European Credit Card companies. It contains financial transactions for a two-day period, where 492 frauds were detected among nearly 290,000 transactions. As we can already notice, this is an unbalanced dataset, where fraud accounts for only 0.17% of the total. Another detail is that the features are all numerical and have been mischaracterized (due to privacy and security issues). The original data page reports that the variables were transformed using the Principal Component Analysis (PCA).

dataset, fraudulent transaction, transaction, (8 more...)

#artificialintelligence

Industry:

Law Enforcement & Public Safety > Fraud (1.00)
Information Technology > Security & Privacy (1.00)
Banking & Finance (0.92)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Neural Network Classifier as Mutual Information Evaluator

Qin, Zhenyue, Kim, Dongwoo, Gedeon, Tom

arXiv.org Machine LearningAug-14-2021

Cross-entropy loss with softmax output is a standard choice to train neural network classifiers. We give a new view of neural network classifiers with softmax and cross-entropy as mutual information evaluators. We show that when the dataset is balanced, training a neural network with cross-entropy maximises the mutual information between inputs and labels through a variational form of mutual information. Thereby, we develop a new form of softmax that also converts a classifier to a mutual information evaluator when the dataset is imbalanced. Experimental results show that the new form leads to better classification accuracy, in particular for imbalanced datasets.

dataset, information, mutual information, (11 more...)

arXiv.org Machine Learning

2106.10471

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.85)

Add feedback