Detecting Label Errors by using Pre-Trained Language Models
Chong, Derek, Hong, Jenny, Manning, Christopher D.
–arXiv.org Artificial Intelligence
We show that large pre-trained language models are inherently highly capable of identifying label errors in natural language datasets: simply examining out-of-sample data points in descending order of fine-tuned task loss significantly outperforms more complex error-detection mechanisms proposed in previous work. To this end, we contribute a novel method for introducing realistic, human-originated label noise into existing crowdsourced datasets such as SNLI and TweetNLP. We show that this noise has similar properties to real, hand-verified label errors, and is harder to detect than existing synthetic noise, creating challenges for model robustness. We argue that human-originated noise is a better standard for evaluation than synthetic noise. Finally, we use crowdsourced verification to evaluate the detection of real errors on IMDB, Amazon Reviews, and Recon, and confirm that pre-trained models perform at a 9-36% higher absolute Area Under the Precision-Recall Curve than existing models.
arXiv.org Artificial Intelligence
Dec-15-2022
- Country:
- Oceania
- New Zealand (0.04)
- Australia (0.04)
- North America
- Dominican Republic (0.04)
- Canada (0.04)
- United States
- Virginia (0.04)
- Maryland (0.04)
- Pennsylvania > Allegheny County
- Pittsburgh (0.04)
- Oregon > Multnomah County
- Portland (0.04)
- New York > New York County
- New York City (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Georgia > Fulton County
- Atlanta (0.04)
- California > Santa Clara County
- Palo Alto (0.04)
- Europe
- Asia
- China > Hong Kong (0.04)
- Myanmar > Tanintharyi Region
- Dawei (0.04)
- Oceania
- Genre:
- Research Report > Promising Solution (0.48)
- Industry:
- Media > Film (0.46)
- Information Technology > Services (0.46)
- Technology: