Robust Data Pruning under Label Noise via Maximizing Re-labeling Accuracy

Feb-11-2025, 15:29:56 GMT–Neural Information Processing Systems

Data pruning, which aims to downsize a large training set into a small informative subset, is crucial for reducing the enormous computational costs of modern deep learning. Though large-scale data collections invariably contain annotation noise and numerous robust learning methods have been developed, data pruning for the noise-robust learning scenario has received little attention. With state-of-the-art Re-labeling methods that self-correct erroneous labels while training, it is challenging to identify which subset induces the most accurate re-labeling of erroneous labels in the entire training set. We first show that the likelihood of a training example being correctly re-labeled is proportional to the prediction confidence of its neighborhood in the subset. Therefore, we propose a novel data pruning algorithm, Prune4Rel, that finds a subset maximizing the total neighborhood confidence of all training examples, thereby maximizing the re-labeling accuracy and generalization performance.

label noise, maximizing re-labeling accuracy, robust data pruning, (3 more...)

Neural Information Processing Systems

Feb-11-2025, 15:29:56 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)