Learning with Noisy Labels

Nagarajan Natarajan, Inderjit S. Dhillon, Pradeep K. Ravikumar, Ambuj Tewari

Neural Information Processing Systems 

In this paper, we theoretically study the problem of binary c lassification in the presence of random classification noise -- the learner, inste ad of seeing the true labels, sees labels that have independently been flipped with s ome small probability. Moreover, random label noise is class-conditional -- the flip probability depends on the class. W e provide two approaches to suitably modify an y given surrogate loss function. First, we provide a simple unbiased estimato r of any loss, and obtain performance bounds for empirical risk minimization in the presence of iid data with noisy labels. If the loss function satisfies a simpl e symmetry condition, we show that the method leads to an efficient algorithm for emp irical minimization. Second, by leveraging a reduction of risk minimizatio n under noisy labels to classification with weighted 0-1 loss, we suggest the use o f a simple weighted surrogate loss, for which we are able to obtain strong empiri cal risk bounds. This approach has a very remarkable consequence -- methods used in practice such as biased SVM and weighted logistic regression are provably noise-tolerant. On a synthetic non-separable dataset, our methods achieve ove r 88% accuracy even when 40% of the labels are corrupted, and are competitive wit h respect to recently proposed methods for dealing with label noise in several ben chmark datasets.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found