Mislabeled examples detection viewed as probing machine learning models: concepts, survey and extensive benchmark

George, Thomas, Nodet, Pierre, Bondu, Alexis, Lemaire, Vincent

Oct-21-2024–arXiv.org Artificial Intelligence

Mislabeled examples are ubiquitous in real-world machine learning datasets, advocating the development of techniques for automatic detection. We show that most mislabeled detection methods can be viewed as probing trained machine learning models using a few core principles. We formalize a modular framework that encompasses these methods, parameterized by only 4 building blocks, as well as a Python library that demonstrates that these principles can actually be implemented. The focus is on classifier-agnostic concepts, with an emphasis on adapting methods developed for deep learning models to non-deep classifiers for tabular data. We benchmark existing methods on (artificial) Completely At Random (NCAR) as well as (realistic) Not At Random (NNAR) labeling noise from a variety of tasks with imperfect labeling rules. This benchmark provides new insights as well as limitations of existing methods in this setup.

artificial intelligence, inductive learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

Oct-21-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - California (0.04)
  - Oregon > Multnomah County
    - Portland (0.04)
  - New York > New York County
    - New York City (0.04)
  - Louisiana > Orleans Parish
    - New Orleans (0.04)
- Europe
  - Switzerland (0.04)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - Slovenia > Drava
    - Municipality of Benedikt > Benedikt (0.04)
- Asia
  - Singapore (0.04)
  - Myanmar > Tanintharyi Region
    - Dawei (0.04)

Genre:
- Research Report (1.00)
- Overview (1.00)
- Instructional Material > Course Syllabus & Notes (0.46)

Industry:
- Information Technology > Security & Privacy (0.45)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning (1.00)
  - Neural Networks > Deep Learning (1.00)
  - Inductive Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found