random classification noise
Learning with Noisy Labels
Nagarajan Natarajan, Inderjit S. Dhillon, Pradeep K. Ravikumar, Ambuj Tewari
In this paper, we theoretically study the problem of binary c lassification in the presence of random classification noise -- the learner, inste ad of seeing the true labels, sees labels that have independently been flipped with s ome small probability. Moreover, random label noise is class-conditional -- the flip probability depends on the class. W e provide two approaches to suitably modify an y given surrogate loss function. First, we provide a simple unbiased estimato r of any loss, and obtain performance bounds for empirical risk minimization in the presence of iid data with noisy labels. If the loss function satisfies a simpl e symmetry condition, we show that the method leads to an efficient algorithm for emp irical minimization. Second, by leveraging a reduction of risk minimizatio n under noisy labels to classification with weighted 0-1 loss, we suggest the use o f a simple weighted surrogate loss, for which we are able to obtain strong empiri cal risk bounds. This approach has a very remarkable consequence -- methods used in practice such as biased SVM and weighted logistic regression are provably noise-tolerant. On a synthetic non-separable dataset, our methods achieve ove r 88% accuracy even when 40% of the labels are corrupted, and are competitive wit h respect to recently proposed methods for dealing with label noise in several ben chmark datasets.
- North America > United States > Texas (0.04)
- North America > United States > Michigan (0.04)
- Asia > Middle East > Jordan (0.04)
Statistical Active Learning Algorithms
We describe a framework for designing efficient active learning algorithms that are tolerant to random classification noise. The framework is based on active learning algorithms that are statistical in the sense that they rely on estimates of expectations of functions of filtered random examples. It builds on the powerful statistical query framework of Kearns (1993). We show that any efficient active statistical learning algorithm can be automatically converted to an efficient active learning algorithm which is tolerant to random classification noise as well as other forms of uncorrelated" noise. The complexity of the resulting algorithms has information-theoretically optimal quadratic dependence on 1/(1-2\eta), where \eta is the noise rate. We demonstrate the power of our framework by showing that commonly studied concept classes including thresholds, rectangles, and linear separators can be efficiently actively learned in our framework. These results combined with our generic conversion lead to the first known computationally-efficient algorithms for actively learning some of these concept classes in the presence of random classification noise that provide exponential improvement in the dependence on the error \epsilon over their passive counterparts. In addition, we show that our algorithms can be automatically converted to efficient active differentially-private algorithms. This leads to the first differentially-private active learning algorithms with exponential label savings over the passive case."
Near-Optimal Bounds for Learning Gaussian Halfspaces with Random Classification Noise
We study the problem of learning general (i.e., not necessarily homogeneous) halfspaces with Random Classification Noise under the Gaussian distribution. We establish nearly-matching algorithmic and Statistical Query (SQ) lower bound results revealing a surprising information-computation gap for this basic problem. Specifically, the sample complexity of this learning problem is \widetilde{\Theta}(d/\epsilon), where d is the dimension and \epsilon is the excess error. Our positive result is a computationally efficient learning algorithm with sample complexity \tilde{O}(d/\epsilon d/\max(p, \epsilon)) 2), where p quantifies the bias of the target halfspace. On the lower bound side, we show that any efficient SQ algorithm (or low-degree test)for the problem requires sample complexity at least \Omega(d {1/2}/(\max(p, \epsilon)) 2) .
Approximating the Number of Relevant Variables in a Parity Implies Proper Learning
Bshouty, Nader H., Haddad, George
Consider the model where we can access a parity function through random uniform labeled examples in the presence of random classification noise. In this paper, we show that approximating the number of relevant variables in the parity function is as hard as properly learning parities. More specifically, let $\gamma:{\mathbb R}^+\to {\mathbb R}^+$, where $\gamma(x) \ge x$, be any strictly increasing function. In our first result, we show that from any polynomial-time algorithm that returns a $\gamma$-approximation, $D$ (i.e., $\gamma^{-1}(d(f)) \leq D \leq \gamma(d(f))$), of the number of relevant variables~$d(f)$ for any parity $f$, we can, in polynomial time, construct a solution to the long-standing open problem of polynomial-time learning $k(n)$-sparse parities (parities with $k(n)\le n$ relevant variables), where $k(n) = \omega_n(1)$. In our second result, we show that from any $T(n)$-time algorithm that, for any parity $f$, returns a $\gamma$-approximation of the number of relevant variables $d(f)$ of $f$, we can, in polynomial time, construct a $poly(\Gamma(n))T(\Gamma(n)^2)$-time algorithm that properly learns parities, where $\Gamma(x)=\gamma(\gamma(x))$. If $T(\Gamma(n)^2)=\exp({o(n/\log n)})$, this would resolve another long-standing open problem of properly learning parities in the presence of random classification noise in time $\exp({o(n/\log n)})$.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Asia > China > Heilongjiang Province > Daqing (0.04)
- North America > United States > Nevada > Clark County > Las Vegas (0.04)
- (12 more...)
Statistical Active Learning Algorithms
We describe a framework for designing efficient active learning algorithms that are tolerant to random classification noise and differentially-private. The framework is based on active learning algorithms that are statistical in the sense that they rely on estimates of expectations of functions of filtered random examples. It builds on the powerful statistical query framework of Kearns [30]. We show that any efficient active statistical learning algorithm can be automatically converted to an efficient active learning algorithm which is tolerant to random classification noise as well as other forms of "uncorrelated" noise. We show that commonly studied concept classes including thresholds, rectangles, and linear separators can be efficiently actively learned in our framework. These results combined with our generic conversion lead to the first computationally-efficient algorithms for actively learning some of these concept classes in the presence of random classification noise that provide exponential improvement in the dependence on the error ɛ over their passive counterparts. In addition, we show that our algorithms can be automatically converted to efficient active differentially-private algorithms. This leads to the first differentially-private active learning algorithms with exponential label savings over the passive case.
Learning with Noisy Labels
In this paper, we theoretically study the problem of binary classification in the presence of random classification noise -- the learner, instead of seeing the true labels, sees labels that have independently been flipped with some small probability. Moreover, random label noise is class-conditional -- the flip probability depends on the class. We provide two approaches to suitably modify any given surrogate loss function. First, we provide a simple unbiased estimator of any loss, and obtain performance bounds for empirical risk minimization in the presence of iid data with noisy labels. If the loss function satisfies a simple symmetry condition, we show that the method leads to an efficient algorithm for empirical minimization. Second, by leveraging a reduction of risk minimization under noisy labels to classification with weighted 0-1 loss, we suggest the use of a simple weighted surrogate loss, for which we are able to obtain strong empirical risk bounds. This approach has a very remarkable consequence -- methods used in practice such as biased SVM and weighted logistic regression are provably noise-tolerant. On a synthetic non-separable dataset, our methods achieve over 88% accuracy even when 40% of the labels are corrupted, and are competitive with respect to recently proposed methods for dealing with label noise in several benchmark datasets.
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
- Asia > Middle East > Jordan (0.04)
Statistical Active Learning Algorithms
We describe a framework for designing efficient active learning algorithms that are tolerant to random classification noise. The framework is based on active learning algorithms that are statistical in the sense that they rely on estimates of expectations of functions of filtered random examples. It builds on the powerful statistical query framework of Kearns (1993). We show that any efficient active statistical learning algorithm can be automatically converted to an efficient active learning algorithm which is tolerant to random classification noise as well as other forms of uncorrelated" noise. The complexity of the resulting algorithms has information-theoretically optimal quadratic dependence on 1/(1-2\eta), where \eta is the noise rate. We demonstrate the power of our framework by showing that commonly studied concept classes including thresholds, rectangles, and linear separators can be efficiently actively learned in our framework. These results combined with our generic conversion lead to the first known computationally-efficient algorithms for actively learning some of these concept classes in the presence of random classification noise that provide exponential improvement in the dependence on the error \epsilon over their passive counterparts. In addition, we show that our algorithms can be automatically converted to efficient active differentially-private algorithms. This leads to the first differentially-private active learning algorithms with exponential label savings over the passive case."
Information-Computation Tradeoffs for Learning Margin Halfspaces with Random Classification Noise
Diakonikolas, Ilias, Diakonikolas, Jelena, Kane, Daniel M., Wang, Puqian, Zarifis, Nikos
We study the problem of PAC learning $\gamma$-margin halfspaces with Random Classification Noise. We establish an information-computation tradeoff suggesting an inherent gap between the sample complexity of the problem and the sample complexity of computationally efficient algorithms. Concretely, the sample complexity of the problem is $\widetilde{\Theta}(1/(\gamma^2 \epsilon))$. We start by giving a simple efficient algorithm with sample complexity $\widetilde{O}(1/(\gamma^2 \epsilon^2))$. Our main result is a lower bound for Statistical Query (SQ) algorithms and low-degree polynomial tests suggesting that the quadratic dependence on $1/\epsilon$ in the sample complexity is inherent for computationally efficient algorithms. Specifically, our results imply a lower bound of $\widetilde{\Omega}(1/(\gamma^{1/2} \epsilon^2))$ on the sample complexity of any efficient SQ learner or low-degree test.
- North America > United States > California > San Diego County > San Diego (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
The perils of being unhinged: On the accuracy of classifiers minimizing a noise-robust convex loss
Long, Philip M., Servedio, Rocco A.
As van Rooyen et al. noted in the first sentence of the abstract of [vRMW15], "Convex potential minimisation is the de facto approach to binary classification." Given the ubiquity of this approach, it is natural to study its abilities and limitations in the presence of noise, andindeed this isthe subject of many works (see e.g.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
- Asia > Middle East > Jordan (0.04)
Noise in Classification
Balcan, Maria-Florina, Haghtalab, Nika
Machine learning studies automatic methods for making accurate predictions and useful decisions based on previous observations and experience. From the application point of view, machine learning has become a successful discipline for operating in complex domains such as natural language processing, speech recognition, and computer vision. Moreover, the theoretical foundations of machine learning have led to the development of powerful and versatile techniques, which are routinely used in a wide range of commercial systems in today's world. However, a major challenge of increasing importance in the theory and practice of machine learning is to provide algorithms that are robust to adversarial noise. In this chapter, we focus on classification where the goal is to learn a classification rule from labeled examples only.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)