Goto

Collaborating Authors

 ramp loss



On the Role of Noise in the Sample Complexity of Learning Recurrent Neural Networks: Exponential Gaps for Long Sequences

Neural Information Processing Systems

We consider the class of noisy multi-layered sigmoid recurrent neural networks with w (unbounded) weights for classification of sequences of length T, where independent noise distributed according to N(0,σ2)is added to the output of each neuron in the network. Our main result shows that the sample complexity of PAC learning this class can be bounded by O(wlog(T/σ)). For the non-noisy version of the same class (i.e., σ = 0), we prove a lower bound of Ω(wT) for the sample complexity. Our results indicate an exponential gap in the dependence of sample complexity on T for noisy versus non-noisy networks. Moreover, given the mild logarithmic dependence of the upper bound on 1/σ, this gap still holds even for numerically negligible values of σ.1





Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper provides some theoretical analysis on Positive and Unlabeled Data learning (PU), when only the positive instances and unlabeled instances are available. The authors show that learning from PU is equivalent to a cost-sensitive classification task if the label prior is known. The main contribution of the paper is that, the authors show that using any convex loss leads to inconsistent classifier for PU tasks. Instead, using a non-convex ramp loss gives a consistent estimator. This theoretical justification is supported by experiments, which demonstrate that adopting hinge loss of SVM may result in very bad classification error comparing to using the non-convex ramp loss.


Analysis of Learning from Positive and Unlabeled Data

Neural Information Processing Systems

Learning a classifier from positive and unlabeled data is an important class of classification problems that are conceivable in many practical applications. In this paper, we first show that this problem can be solved by cost-sensitive learning between positive and unlabeled data. We then show that convex surrogate loss functions such as the hinge loss may lead to a wrong classification boundary due to an intrinsic bias, but the problem can be avoided by using non-convex loss functions such as the ramp loss. We next analyze the excess risk when the class prior is estimated from data, and show that the classification accuracy is not sensitive to class prior estimation if the unlabeled data is dominated by the positive data (this is naturally satisfied in inlier-based outlier detection because inliers are dominant in the unlabeled dataset). Finally, we provide generalization error bounds and show that, for an equal number of labeled and unlabeled samples, the generalization error of learning only from positive and unlabeled samples is no worse than 2 2 times the fully supervised case. These theoretical findings are also validated through experiments.


Generalization Bounds and Consistency for Latent Structural Probit and Ramp Loss

Neural Information Processing Systems

We consider latent structural versions of probit loss and ramp loss. We show that these surrogate loss functions are consistent in the strong sense that for any feature map (finite or infinite dimensional) they yield predictors approaching the infimum task loss achievable by any linear predictor over the given features. We also give finite sample generalization bounds (convergence rates) for these loss functions. These bounds suggest that probit loss converges more rapidly. However, ramp loss is more easily optimized on a given sample.


On the Role of Noise in the Sample Complexity of Learning Recurrent Neural Networks: Exponential Gaps for Long Sequences

arXiv.org Artificial Intelligence

We consider the class of noisy multi-layered sigmoid recurrent neural networks with $w$ (unbounded) weights for classification of sequences of length $T$, where independent noise distributed according to $\mathcal{N}(0,\sigma^2)$ is added to the output of each neuron in the network. Our main result shows that the sample complexity of PAC learning this class can be bounded by $O (w\log(T/\sigma))$. For the non-noisy version of the same class (i.e., $\sigma=0$), we prove a lower bound of $\Omega (wT)$ for the sample complexity. Our results indicate an exponential gap in the dependence of sample complexity on $T$ for noisy versus non-noisy networks. Moreover, given the mild logarithmic dependence of the upper bound on $1/\sigma$, this gap still holds even for numerically negligible values of $\sigma$.


Satisfying Real-world Goals with Dataset Constraints

Neural Information Processing Systems

The goal of minimizing misclassification error on a training set is often just one of several real-world goals that might be defined on different datasets. For example, one may require a classifier to also make positive predictions at some specified rate for some subpopulation (fairness), or to achieve a specified empirical recall. Other real-world goals include reducing churn with respect to a previously deployed model, or stabilizing online training. In this paper we propose handling multiple goals on multiple datasets by training with dataset constraints, using the ramp penalty to accurately quantify costs, and present an efficient algorithm to approximately optimize the resulting non-convex constrained optimization problem. Experiments on both benchmark and real-world industry datasets demonstrate the effectiveness of our approach.