kernel classifier
Consistent Non-Parametric Methods for Maximizing Robustness
Learning classifiers that are robust to adversarial examples has received a great deal of recent attention. A major drawback of the standard robust learning framework is there is an artificial robustness radius r that applies to all inputs. This ignores the fact that data may be highly heterogeneous, in which case it is plausible that robustness regions should be larger in some regions of data, and smaller in others. In this paper, we address this limitation by proposing a new limit classifier, called the neighborhood optimal classifier, that extends the Bayes optimal classifier outside its support by using the label of the closest in-support point. We then argue that this classifier maximizes the size of its robustness regions subject to the constraint of having accuracy equal to the Bayes optimal. We then present sufficient conditions under which general non-parametric methods that can be represented as weight functions converge towards this limit, and show that both nearest neighbors and kernel classifiers satisfy them under certain conditions.
The Optimality of Kernel Classifiers in Sobolev Space
Lai, Jianfa, Li, Zhifan, Huang, Dongming, Lin, Qian
Kernel methods are widely used in machine learning, especially for classification problems. However, the theoretical analysis of kernel classification is still limited. This paper investigates the statistical performances of kernel classifiers. With some mild assumptions on the conditional probability $\eta(x)=\mathbb{P}(Y=1\mid X=x)$, we derive an upper bound on the classification excess risk of a kernel classifier using recent advances in the theory of kernel regression. We also obtain a minimax lower bound for Sobolev spaces, which shows the optimality of the proposed classifier. Our theoretical results can be extended to the generalization error of overparameterized neural network classifiers. To make our theoretical results more applicable in realistic settings, we also propose a simple method to estimate the interpolation smoothness of $2\eta(x)-1$ and apply the method to real datasets.
Shot-frugal and Robust quantum kernel classifiers
Shastry, Abhay, Jayakumar, Abhijith, Patel, Apoorva, Bhattacharyya, Chiranjib
Quantum kernel methods are a candidate for quantum speed-ups in supervised machine learning. The number of quantum measurements N required for a reasonable kernel estimate is a critical resource, both from complexity considerations and because of the constraints of near-term quantum hardware. We emphasize that for classification tasks, the aim is reliable classification and not precise kernel evaluation, and demonstrate that the former is far more resource efficient. Furthermore, it is shown that the accuracy of classification is not a suitable performance metric in the presence of noise and we motivate a new metric that characterizes the reliability of classification. We then obtain a bound for N which ensures, with high probability, that classification errors over a dataset are bounded by the margin errors of an idealized quantum kernel classifier. Using chance constraint programming and the subgaussian bounds of quantum kernel distributions, we derive several Shot-frugal and Robust (ShofaR) programs starting from the primal formulation of the Support Vector Machine. This significantly reduces the number of quantum measurements needed and is robust to noise by construction. Our strategy is applicable to uncertainty in quantum kernels arising from any source of unbiased noise.
Bayes Risk Consistency of Nonparametric Classification Rules for Spike Trains Data
Pawlak, Mirosław, Pabian, Mateusz, Rzepka, Dominik
Spike trains data find a growing list of applications in computational neuroscience, imaging, streaming data and finance. Machine learning strategies for spike trains are based on various neural network and probabilistic models. The probabilistic approach is relying on parametric or nonparametric specifications of the underlying spike generation model. In this paper we consider the two-class statistical classification problem for a class of spike train data characterized by nonparametrically specified intensity functions. We derive the optimal Bayes rule and next form the plug-in nonparametric kernel classifier. Asymptotical properties of the rules are established including the limit with respect to the increasing recording time interval and the size of a training set. In particular the convergence of the kernel classifier to the Bayes rule is proved. The obtained results are supported by a finite sample simulation studies.
Bayesian Model Selection for Support Vector Machines, Gaussian Processes and Other Kernel Classifiers
We present a variational Bayesian method for model selection over families of kernels classifiers like Support Vector machines or Gaus(cid:173) sian processes. The algorithm needs no user interaction and is able to adapt a large number of kernel parameters to given data without having to sacrifice training cases for validation. This opens the pos(cid:173) sibility to use sophisticated families of kernels in situations where the small "standard kernel" classes are clearly inappropriate. We relate the method to other work done on Gaussian processes and clarify the relation between Support Vector machines and certain Gaussian process models.
Discriminative Direction for Kernel Classifiers
Once a classifier is estimated from the training data, it can be used to label new examples, and in many application domains, such as character recognition, text classification and oth- ers, this constitutes the final goal of the learning stage. The statistical learning algorithms are also used in scientific studies to detect and analyze differences between the two classes when the correct answer'' is unknown, and the information we have on the differences is represented implicitly by the training set. Example applications include morphologi- cal analysis of anatomical organs (comparing organ shape in patients vs. normal controls), molecular design (identifying complex molecules that satisfy certain requirements), etc. In such applications, interpretation of the resulting classifier in terms of the original feature vectors can provide an insight into the nature of the differences detected by the learning algorithm and is therefore a crucial step in the analysis. Furthermore, we would argue that studying the spatial structure of the data captured by the classification function is important in any application, as it leads to a better understanding of the data and can potentially help in improving the technique.
The Variational Ising Classifier (VIC) Algorithm for Coherently Contaminated Data
There has been substantial progress in the past decade in the development of object classifiers for images, for example of faces, humans and vehi- cles. Here we address the problem of contaminations (e.g. Variational inference is used to marginalize over contamination and obtain robust classification. In this way the VIC ap- proach can turn a kernel classifier for clean data into one that can tolerate contamination, without any specific training on contaminated positives. Recent progress in discriminative object detection, especially for faces, has yielded good performance and efficiency [1, 2, 3, 4].
Performance analysis for L\_2 kernel classification
We provide statistical performance guarantees for a recently introduced kernel classifier that optimizes the $L_2$ or integrated squared error (ISE) of a difference of densities. The classifier is similar to a support vector machine (SVM) in that it is the solution of a quadratic program and yields a sparse classifier. Unlike SVMs, however, the $L_2$ kernel classifier does not involve a regularization parameter. We prove a distribution free concentration inequality for a cross-validation based estimate of the ISE, and apply this result to deduce an oracle inequality and consistency of the classifier on the sense of both ISE and probability of error. Our results can also be specialized to give performance guarantees for an existing method of $L_2$ kernel density estimation.
The Impact of Unlabeled Patterns in Rademacher Complexity Theory for Kernel Classifiers
We derive here new generalization bounds, based on Rademacher Complexity theory, for model selection and error estimation of linear (kernel) classifiers, which exploit the availability of unlabeled samples. In particular, two results are obtained: the first one shows that, using the unlabeled samples, the confidence term of the conventional bound can be reduced by a factor of three; the second one shows that the unlabeled samples can be used to obtain much tighter bounds, by building localized versions of the hypothesis class containing the optimal classifier.