Early Stopping Against Label Noise Without Validation Data

Yuan, Suqin, Feng, Lei, Liu, Tongliang

arXiv.org Artificial Intelligence 

Concretely, sparing more data for validation from training data would limit the performance of the learned model, yet insufficient validation data could result in a sub-optimal selection of the desired model. In this paper, we propose a novel early stopping method called Label Wave, which does not require validation data for selecting the desired model in the presence of label noise. It works by tracking the changes in the model's predictions on the training set during the training process, aiming to halt training before the model unduly fits mislabeled data. This method is empirically supported by our observation that minimum fluctuations in predictions typically occur at the training epoch before the model excessively fits mislabeled data. Through extensive experiments, we show both the effectiveness of the Label Wave method across various settings and its capability to enhance the performance of existing methods for learning with noisy labels. Deep Neural Networks (DNNs) are praised for their remarkable expressive power, which allows them to uncover intricate patterns in high-dimensional data (Montufar et al., 2014; LeCun et al., 2015) and can even fit data with random labels. However, this strength, often termed Memorization (Zhang et al., 2017), can be a double-edged sword, especially when encountering label noise. When label noise exists, the inherent capability of DNNs might cause the model to fit mislabeled examples from noisy datasets, which can deteriorate its generalization performance. Specifically, when DNNs are trained on noisy datasets containing both clean and mislabeled examples, it is often observed that the test error initially decreases and subsequently increases. To prevent DNNs from overconfidently learning from mislabeled examples, many existing methods for learning with noisy labels (Xia et al., 2019; Han et al., 2020; Song et al., 2022; Huang et al., 2023) explicitly or implicitly adopted the operation of halting training before the test error increases--a strategy termed "early stopping". Early stopping relies on model selection, aiming to choose a model that aligns most closely with the true concept from a range of candidate models obtained during the training process (Mohri et al., 2018; Bai et al., 2021). To this end, leveraging hold-out validation data to pinpoint an appropriate early stopping point for model selection becomes a prevalent approach (Xu & Goodacre, 2018) in deep learning. However, this approach heavily relies on additional validation data that is usually derived by splitting the training set, thereby resulting in degraded performance due to insufficient training data.