clean label
Toward Robustness against Label Noise in Training Deep Discriminative Neural Networks
Collecting large training datasets, annotated with high-quality labels, is costly and time-consuming. This paper proposes a novel framework for training deep convolutional neural networks from noisy labeled datasets that can be obtained cheaply. The problem is formulated using an undirected graphical model that represents the relationship between noisy and clean labels, trained in a semi-supervised setting. In our formulation, the inference over latent clean labels is tractable and is regularized during training using auxiliary sources of information. The proposed model is applied to the image labeling problem and is shown to be effective in labeling unseen images as well as reducing label noise in training on CIFAR-10 and MS COCO datasets.
Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise
The growing importance of massive datasets with the advent of deep learning makes robustness to label noise a critical property for classifiers to have. Sources of label noise include automatic labeling for large datasets, non-expert labeling, and label corruption by data poisoning adversaries. In the latter case, corruptions may be arbitrarily bad, even so bad that a classifier predicts the wrong labels with high confidence. To protect against such sources of noise, we leverage the fact that a small set of clean labels is often easy to procure. We demonstrate that robustness to label noise up to severe strengths can be achieved by using a set of trusted data with clean labels, and propose a loss correction that utilizes trusted examples in a data-efficient manner to mitigate the effects of label noise on deep neural network classifiers. Across vision and natural language processing tasks, we experiment with various label noises at several strengths, and show that our method significantly outperforms existing methods.
Co-teaching: Robust training of deep neural networks with extremely noisy labels
Deep learning with noisy labels is practically challenging, as the capacity of deep models is so high that they can totally memorize these noisy labels sooner or later during training. Nonetheless, recent studies on the memorization effects of deep neural networks show that they would first memorize training data of clean labels and then those of noisy labels. Therefore in this paper, we propose a new deep learning paradigm called ''Co-teaching'' for combating with noisy labels. Namely, we train two deep neural networks simultaneously, and let them teach each other given every mini-batch: firstly, each network feeds forward all data and selects some data of possibly clean labels; secondly, two networks communicate with each other what data in this mini-batch should be used for training; finally, each network back propagates the data selected by its peer network and updates itself. Empirical results on noisy versions of MNIST, CIFAR-10 and CIFAR-100 demonstrate that Co-teaching is much superior to the state-of-the-art methods in the robustness of trained deep models.
- North America > United States (0.46)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Japan (0.04)
- Asia > China > Hong Kong (0.04)
- Asia > Japan > Honshū > Tōhoku (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Europe > Norway (0.04)
- (2 more...)
- Law (0.68)
- Education (0.68)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- North America > United States > Nevada (0.04)
- (2 more...)
- Asia > China > Guangdong Province > Guangzhou (0.04)
- Asia > China > Liaoning Province > Dalian (0.04)
- Health & Medicine (0.93)
- Education > Educational Setting (0.32)
- Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.62)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.48)