Learning from Noisy Labels via Self-Taught On-the-Fly Meta Loss Rescaling
Heck, Michael, Geishauser, Christian, Lubis, Nurul, van Niekerk, Carel, Feng, Shutong, Lin, Hsien-Chin, Ruppik, Benjamin Matthias, Vukovic, Renato, Gašić, Milica
–arXiv.org Artificial Intelligence
Correct labels are indispensable for training effective machine learning models. However, creating high-quality labels is expensive, and even professionally labeled data contains errors and ambiguities. Filtering and denoising can be applied to curate labeled data prior to training, at the cost of additional processing and loss of information. An alternative is on-the-fly sample reweighting during the training process to decrease the negative impact of incorrect or ambiguous labels, but this typically requires clean seed data. In this work we propose unsupervised on-the-fly meta loss rescaling to reweight training samples. Crucially, we rely only on features provided by the model being trained, to learn a rescaling function in real time without knowledge of the true clean data distribution. We achieve this via a novel meta learning setup that samples validation data for the meta update directly from the noisy training corpus by employing the rescaling function being trained. Our proposed method consistently improves performance across various NLP tasks with minimal computational overhead. Further, we are among the first to attempt on-the-fly training data reweighting on the challenging task of dialogue modeling, where noisy and ambiguous labels are common. Our strategy is robust in the face of noisy and clean data, handles class imbalance, and prevents overfitting to noisy labels. Our self-taught loss rescaling improves as the model trains, showing the ability to keep learning from the model's own signals. As training progresses, the impact of correctly labeled data is scaled up, while the impact of wrongly labeled data is suppressed.
arXiv.org Artificial Intelligence
Dec-17-2024
- Country:
- Asia > Singapore (0.04)
- Oceania > Australia
- New South Wales > Sydney (0.04)
- North America
- United States
- New York > New York County
- New York City (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- California
- Los Angeles County > Long Beach (0.14)
- San Diego County > San Diego (0.04)
- New York > New York County
- Canada
- Quebec > Montreal (0.04)
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- United States
- Europe
- Switzerland (0.04)
- Sweden > Stockholm
- Stockholm (0.04)
- Spain > Valencian Community
- Valencia Province > Valencia (0.04)
- Italy > Sicily
- Palermo (0.04)
- Germany > North Rhine-Westphalia
- Düsseldorf Region > Düsseldorf (0.04)
- France > Hauts-de-France
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Genre:
- Research Report (0.64)
- Technology: