Detecting Label Errors by using Pre-Trained Language Models

Chong, Derek, Hong, Jenny, Manning, Christopher D.

Dec-15-2022–arXiv.org Artificial Intelligence

We show that large pre-trained language models are inherently highly capable of identifying label errors in natural language datasets: simply examining out-of-sample data points in descending order of fine-tuned task loss significantly outperforms more complex error-detection mechanisms proposed in previous work. To this end, we contribute a novel method for introducing realistic, human-originated label noise into existing crowdsourced datasets such as SNLI and TweetNLP. We show that this noise has similar properties to real, hand-verified label errors, and is harder to detect than existing synthetic noise, creating challenges for model robustness. We argue that human-originated noise is a better standard for evaluation than synthetic noise. Finally, we use crowdsourced verification to evaluate the detection of real errors on IMDB, Amazon Reviews, and Recon, and confirm that pre-trained models perform at a 9-36% higher absolute Area Under the Precision-Recall Curve than existing models.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

Dec-15-2022

arXiv.org PDF

Add feedback

Country:
- Oceania
  - New Zealand (0.04)
  - Australia (0.04)
- North America
  - Dominican Republic (0.04)
  - Canada (0.04)
  - United States
    - Virginia (0.04)
    - Maryland (0.04)
    - Pennsylvania > Allegheny County
      - Pittsburgh (0.04)
    - Oregon > Multnomah County
      - Portland (0.04)
    - New York > New York County
      - New York City (0.04)
    - Hawaii > Honolulu County
      - Honolulu (0.04)
    - Georgia > Fulton County
      - Atlanta (0.04)
    - California > Santa Clara County
      - Palo Alto (0.04)
- Europe
  - United Kingdom (0.04)
  - Ireland (0.04)
  - Belgium (0.04)
  - Portugal > Lisbon
    - Lisbon (0.04)
- Asia
  - China > Hong Kong (0.04)
  - Myanmar > Tanintharyi Region
    - Dawei (0.04)

Genre:
- Research Report > Promising Solution (0.48)

Industry:
- Media > Film (0.46)
- Information Technology > Services (0.46)

Technology:
- Information Technology
  - Communications > Social Media
    - Crowdsourcing (0.69)
  - Artificial Intelligence
    - Natural Language > Large Language Model (0.68)
    - Machine Learning
      - Performance Analysis > Accuracy (1.00)
      - Statistical Learning (0.93)
      - Neural Networks > Deep Learning (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found