Goto

Collaborating Authors

 chnc


Confidence HNC: A Network Flow Technique for Binary Classification with Noisy Labels

arXiv.org Artificial Intelligence

The performance of machine learning models depends to a great extent on the data quality and, in particular, the reliability of the labels. Label noise is one of the concerning issues that has a tremendous impact on the outcome of learning methods and receives attention from researchers in the community. Among different classes of learning methods, semi-supervised learning is a class of methods that utilize information from unlabeled data in addition to the labeled data, and they are often used in the context where labeled data is scarce or costly [Zhu and Goldberg, 2009]. By counterbalancing the effect of possibly noisy labeled data with information from unlabeled data, these methods also have the potential of mitigating the issue of label noise, on top of its advantage in the scenario where labeled samples are given in a limited amount. A particular class of semi-supervised methods that we are interested in is the class of network-flow based, or graph based, methods in which minimum cut solution of a graph representation of the data provides label prediction of unlabeled samples. Unlabeled samples assist the method through their connectivity with labeled samples, as well as that among themselves.