Mislabeled examples detection viewed as probing machine learning models: concepts, survey and extensive benchmark
George, Thomas, Nodet, Pierre, Bondu, Alexis, Lemaire, Vincent
–arXiv.org Artificial Intelligence
Mislabeled examples are ubiquitous in real-world machine learning datasets, advocating the development of techniques for automatic detection. We show that most mislabeled detection methods can be viewed as probing trained machine learning models using a few core principles. We formalize a modular framework that encompasses these methods, parameterized by only 4 building blocks, as well as a Python library that demonstrates that these principles can actually be implemented. The focus is on classifier-agnostic concepts, with an emphasis on adapting methods developed for deep learning models to non-deep classifiers for tabular data. We benchmark existing methods on (artificial) Completely At Random (NCAR) as well as (realistic) Not At Random (NNAR) labeling noise from a variety of tasks with imperfect labeling rules. This benchmark provides new insights as well as limitations of existing methods in this setup.
arXiv.org Artificial Intelligence
Oct-21-2024
- Country:
- Asia
- Myanmar > Tanintharyi Region
- Dawei (0.04)
- Singapore (0.04)
- Myanmar > Tanintharyi Region
- Europe
- Slovenia > Drava
- Municipality of Benedikt > Benedikt (0.04)
- Switzerland (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Slovenia > Drava
- North America > United States
- California (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- New York > New York County
- New York City (0.04)
- Oregon > Multnomah County
- Portland (0.04)
- Asia
- Genre:
- Instructional Material > Course Syllabus & Notes (0.46)
- Overview (1.00)
- Research Report (1.00)
- Industry:
- Information Technology > Security & Privacy (0.45)
- Technology: