Goto

Collaborating Authors

 confident learning


Bias-Aware Mislabeling Detection via Decoupled Confident Learning

arXiv.org Artificial Intelligence

Reliable data is a cornerstone of modern organizational systems. A notable data integrity challenge stems from label bias, which refers to systematic errors in a label, a covariate that is central to a quantitative analysis, such that its quality differs across social groups. This type of bias has been conceptually and empirically explored and is widely recognized as a pressing issue across critical domains. However, effective methodologies for addressing it remain scarce. In this work, we propose Decoupled Confident Learning (DeCoLe), a principled machine learning based framework specifically designed to detect mislabeled instances in datasets affected by label bias, enabling bias aware mislabelling detection and facilitating data quality improvement. We theoretically justify the effectiveness of DeCoLe and evaluate its performance in the impactful context of hate speech detection, a domain where label bias is a well documented challenge. Empirical results demonstrate that DeCoLe excels at bias aware mislabeling detection, consistently outperforming alternative approaches for label error detection. Our work identifies and addresses the challenge of bias aware mislabeling detection and offers guidance on how DeCoLe can be integrated into organizational data management practices as a powerful tool to enhance data reliability.


Class-wise Autoencoders Measure Classification Difficulty And Detect Label Mistakes

arXiv.org Artificial Intelligence

We introduce a new framework for analyzing classification datasets based on the ratios of reconstruction errors between autoencoders trained on individual classes. This analysis framework enables efficient characterization of datasets on the sample, class, and entire dataset levels. We define reconstruction error ratios (RERs) that probe classification difficulty and allow its decomposition into (1) finite sample size and (2) Bayes error and decision-boundary complexity. Through systematic study across 19 popular visual datasets, we find that our RER-based dataset difficulty probe strongly correlates with error rate for state-of-the-art (SOTA) classification models. By interpreting sample-level classification difficulty as a label mistakenness score, we further find that RERs achieve SOTA performance on mislabel detection tasks on hard datasets under symmetric and asymmetric label noise. Data is the cornerstone of modern machine learning. As the data-centric AI movement has made increasingly clear, both predictive and generative ML models rely on sufficiently large and diverse high-quality datasets (Deng et al., 2009b; Radford et al., 2018; Kaplan et al., 2020). However, it is well known that even popular visual datasets like CIFAR-100 (Krizhevsky & Hinton, 2009), Caltech-256 (Griffin et al., 2007), and ImageNet (Deng et al., 2009b) can have hundreds or thousands of data quality issues, including up to 10% label errors (Northcutt et al., 2021). Consequently, curating a high-quality dataset requires not only data collection but also data cleaning, characterization, evaluation, and refinement. Nevertheless, existing methods for data quality assessment are inherently limited. Methods that seek to estimate the classification difficulty of a sample or dataset are either model-dependent (Ethayarajh et al., 2021), computationally infeasible (Scheidegger et al., 2021), or break down when applied to challenging datasets (Zhang et al., 2020). Likewise, mislabel detection methods either rely on training a strong classifier on the dataset (Pruthi et al., 2020; Pleiss et al., 2020), which becomes more time and compute-intensive for more complex datasets, or exhibit degraded performance on datasets with complex decision boundaries (Zhu et al., 2021; Northcutt et al., 2021). To address these limitations, we propose a novel approach for characterizing the difficulty of classification datasets by decomposing complex multi-class classification problems into one manifold learning problem for each class.


Mitigating Label Bias in Machine Learning: Fairness through Confident Learning

arXiv.org Artificial Intelligence

Discrimination can occur when the underlying unbiased labels are overwritten by an agent with potential bias, resulting in biased datasets that unfairly harm specific groups and cause classifiers to inherit these biases. In this paper, we demonstrate that despite only having access to the biased labels, it is possible to eliminate bias by filtering the fairest instances within the framework of confident learning. In the context of confident learning, low self-confidence usually indicates potential label errors; however, this is not always the case. Instances, particularly those from underrepresented groups, might exhibit low confidence scores for reasons other than labeling errors. To address this limitation, our approach employs truncation of the confidence score and extends the confidence interval of the probabilistic threshold. Additionally, we incorporate with co-teaching paradigm for providing a more robust and reliable selection of fair instances and effectively mitigating the adverse effects of biased labels. Through extensive experimentation and evaluation of various datasets, we demonstrate the efficacy of our approach in promoting fairness and reducing the impact of label bias in machine learning models.


Identifying Incorrect Annotations in Multi-Label Classification Data

arXiv.org Artificial Intelligence

In multi-label classification, each example in a dataset may be annotated as belonging to one or more classes (or none of the classes). Example applications include image (or document) tagging where each possible tag either applies to a particular image (or document) or not. With many possible classes to consider, data annotators are likely to make errors when labeling such data in practice. Here we consider algorithms for finding mislabeled examples in multi-label classification datasets. We propose an extension of the Confident Learning framework to this setting, as well as a label quality score that ranks examples with label errors much higher than those which are correctly labeled. Both approaches can utilize any trained classifier. After demonstrating that our methodology empirically outperforms other algorithms for label error detection, we apply our approach to discover many label errors in the CelebA image tagging dataset.


GitHub - cleanlab/cleanlab: The standard package for machine learning with noisy labels, finding mislabeled data, and uncertainty quantification. Works with most datasets and models.

#artificialintelligence

Check out the: cleanlab code documentation. Past release notes and future features planned is available here. By default, cleanlab requires no hyper-parameters. Pre-computed out-of-sample predicted probabilities for CIFAR-10 train set are available here: [[LINK]]. Check out these examples and tests (includes how to use pyTorch, FastText, etc.).


Confident Learning: Estimating Uncertainty in Dataset Labels

Journal of Artificial Intelligence Research

Learning exists in the context of data, yet notions of confidence typically focus on model predictions, not label quality. Confident learning (CL) is an alternative approach which focuses instead on label quality by characterizing and identifying label errors in datasets, based on the principles of pruning noisy data, counting with probabilistic thresholds to estimate noise, and ranking examples to train with confidence. Whereas numerous studies have developed these principles independently, here, we combine them, building on the assumption of a class-conditional noise process to directly estimate the joint distribution between noisy (given) labels and uncorrupted (unknown) labels. This results in a generalized CL which is provably consistent and experimentally performant. We present sufficient conditions where CL exactly finds label errors, and show CL performance exceeding seven recent competitive approaches for learning with noisy labels on the CIFAR dataset. Uniquely, the CL framework is not coupled to a specific data modality or model (e.g., we use CL to find several label errors in the presumed error-free MNIST dataset and improve sentiment classification on text data in Amazon Reviews). We also employ CL on ImageNet to quantify ontological class overlap (e.g., estimating 645 missile images are mislabeled as their parent class projectile), and moderately increase model accuracy (e.g., for ResNet) by cleaning data prior to training. These results are replicable using the open-source cleanlab release.


An Introduction to Confident Learning: Finding and Learning with Label Errors in Datasets

#artificialintelligence

This post overviews the paper Confident Learning: Estimating Uncertainty in Dataset Labels authored by Curtis G. Northcutt, Lu Jiang, and Isaac L. Chuang. If you've ever used datasets like CIFAR, MNIST, ImageNet, or IMDB, you likely assumed the class labels are correct. Why? Principled approaches for characterizing and finding label errors in massive datasets is challenging and solutions are limited. Surprise: there are likely at least 100,000 label issues in ImageNet. In this post, I discuss an emerging, principled framework to identify label errors, characterize label noise, and learn with noisy labels known as confident learning (CL), open-sourced as the cleanlab Python package.


Confident Learning: Estimating Uncertainty in Dataset Labels

arXiv.org Machine Learning

Learning exists in the context of data, yet notions of $\textit{confidence}$ typically focus on model predictions, not label quality. Confident learning (CL) has emerged as an approach for characterizing, identifying, and learning with noisy labels in datasets, based on the principles of pruning noisy data, counting to estimate noise, and ranking examples to train with confidence. Here, we generalize CL, building on the assumption of a classification noise process, to directly estimate the joint distribution between noisy (given) labels and uncorrupted (unknown) labels. This generalized CL, open-sourced as $\texttt{cleanlab}$, is provably consistent under reasonable conditions, and experimentally performant on ImageNet and CIFAR, outperforming recent approaches, e.g. MentorNet, by $30\%$ or more, when label noise is non-uniform. $\texttt{cleanlab}$ also quantifies ontological class overlap, and can increase model accuracy (e.g. ResNet) by providing clean data for training.