Goto

Collaborating Authors

 mislabeled example






Characterizing Datapoints via Second-Split Forgetting Supplementary Material A Theoretical Results A.1 Preliminaries Let w 2 R

Neural Information Processing Systems

We assume the sample complexity required to estimate the distribution as a proxy for the complexity of the distribution. We make these assumptions to simplify the theoretical exposition. However, our results can be observed even after relaxing them at the expense of more book-keeping. Based on Chatterji and Long [ 9 ], we make the following assumptions about the problem setup: (A.1) The labels are reversed for mislabeled examples.



Identifying Mislabeled Data using the Area Under the Margin Ranking

Neural Information Processing Systems

Our goal is to automatically identify and subsequently remove mislabeled samples from training datasets. Discarding these harmful data will reduce memorization and improve generalization.


An Empirical Study of Automated Mislabel Detection in Real World Vision Datasets

Srikanth, Maya, Irvin, Jeremy, Hill, Brian Wesley, Godoy, Felipe, Sabane, Ishan, Ng, Andrew Y.

arXiv.org Artificial Intelligence

Major advancements in computer vision can primarily be attributed to the use of labeled datasets. However, acquiring labels for datasets often results in errors which can harm model performance. Recent works have proposed methods to automatically identify mislabeled images, but developing strategies to effectively implement them in real world datasets has been sparsely explored. Towards improved data-centric methods for cleaning real world vision datasets, we first conduct more than 200 experiments carefully benchmarking recently developed automated mislabel detection methods on multiple datasets under a variety of synthetic and real noise settings with varying noise levels. We compare these methods to a Simple and Efficient Mislabel Detector (SEMD) that we craft, and find that SEMD performs similarly to or outperforms prior mislabel detection approaches. We then apply SEMD to multiple real world computer vision datasets and test how dataset size, mislabel removal strategy, and mislabel removal amount further affect model performance after retraining on the cleaned data. With careful design of the approach, we find that mislabel removal leads per-class performance improvements of up to 8% of a retrained classifier in smaller data regimes.


Late Stopping: Avoiding Confidently Learning from Mislabeled Examples

Yuan, Suqin, Feng, Lei, Liu, Tongliang

arXiv.org Artificial Intelligence

Sample selection is a prevalent method in learning with noisy labels, where small-loss data are typically considered as correctly labeled data. However, this method may not effectively identify clean hard examples with large losses, which are critical for achieving the model's close-to-optimal generalization performance. In this paper, we propose a new framework, Late Stopping, which leverages the intrinsic robust learning ability of DNNs through a prolonged training process. Specifically, Late Stopping gradually shrinks the noisy dataset by removing high-probability mislabeled examples while retaining the majority of clean hard examples in the training set throughout the learning process. We empirically observe that mislabeled and clean examples exhibit differences in the number of epochs required for them to be consistently and correctly classified, and thus high-probability mislabeled examples can be removed. Experimental results on benchmark-simulated and real-world noisy datasets demonstrate that the proposed method outperforms state-of-the-art counterparts.


Bio+Clinical BERT, BERT Base, and CNN Performance Comparison for Predicting Drug-Review Satisfaction

Ling, Yue

arXiv.org Artificial Intelligence

The objective of this study is to develop natural language processing (NLP) models that can analyze patients' drug reviews and accurately classify their satisfaction levels as positive, neutral, or negative. Such models would reduce the workload of healthcare professionals and provide greater insight into patients' quality of life, which is a critical indicator of treatment effectiveness. To achieve this, we implemented and evaluated several classification models, including a BERT base model, Bio+Clinical BERT, and a simpler CNN. Results indicate that the medical domain-specific Bio+Clinical BERT model significantly outperformed the general domain base BERT model, achieving macro f1 and recall score improvement of 11%, as shown in Table 2. Future research could explore how to capitalize on the specific strengths of each model. Bio+Clinical BERT excels in overall performance, particularly with medical jargon, while the simpler CNN demonstrates the ability to identify crucial words and accurately classify sentiment in texts with conflicting sentiments.