AITopics | hardness estimator

Collaborating Authors

hardness estimator

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Identifying Key Challenges of Hardness-Based Resampling

Pukowski, Pawel, Osmani, Venet

arXiv.org Artificial IntelligenceApr-10-2025

--Performance gap across classes remains a persistent challenge in machine learning, often attributed to variations in class hardness. One way to quantify class hardness is through sample complexity - the minimum number of samples required to effectively learn a given class. Sample complexity theory suggests that class hardness is driven by differences in the amount of data required for generalization. That is, harder classes need substantially more samples to achieve generalization. Therefore, hardness-based resampling is a promising approach to mitigate these performance disparities. While resampling has been studied extensively in data-imbalanced settings, its impact on balanced datasets remains unexplored. This raises the fundamental question whether resampling is effective because it addresses data imbalance or hardness imbalance. We begin addressing this question by introducing class imbalance into balanced datasets and evaluate its effect on performance disparities. We oversample hard classes and undersample easy classes to bring hard classes closer to their sample complexity requirements while maintaining a constant dataset size for fairness. We estimate class-level hardness using the Area Under the Margin (AUM) hardness estimator and leverage it to compute resampling ratios. Using these ratios, we perform hardness-based resampling on the well-known CIF AR-10 and CIF AR-100 datasets. Contrary to theoretical expectations, our results show that hardness-based resampling does not meaningfully affect class-wise performance disparities. T o explain this discrepancy, we conduct detailed analyses to identify key challenges unique to hardness-based imbalance, distinguishing it from traditional data-based imbalance. Our insights help explain why theoretical sample complexity expectations fail to translate into practical performance gains and we provide guidelines for future research. CCESS to large datasets has fueled recent machine learning breakthroughs [1], yet data efficiency remains a critical challenge [2]. Addressing this challenge requires a deep understanding of instance, class, and dataset level hardness. Empirical works reveal large performance gaps across classes [3]-[5] and clear distinctions between easy (e.g. ImageNet) [6], [7] as shown in see Figure 1. Pawel Pukowski is with the University of Sheffield. V enet Osmani is with the Digital Environment Research Institute, Queen Mary University of London. This work has been submitted to the IEEE for possible publication. Figure 1: Training an ensemble of ten ResNet18 networks on CIFAR-10 (left) and CIFAR-100 (right) reveals large recall gaps across classes, despite the balanced nature of these datasets. Paired with significantly larger recall gaps across classes for CIFAR-100 than CIFAR-10, this shows class-and dataset-level hardness discrepancies, which we call hardness-based imbalance.

artificial intelligence, hardness estimator, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2504.07031

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback