christmann
Lp- and Risk Consistency of Localized SVMs
Kernel-based regularized risk minimizers, also called support vector machines (SVMs), are known to possess many desirable properties but suffer from their super-linear computational requirements when dealing with large data sets. This problem can be tackled by using localized SVMs instead, which also offer the additional advantage of being able to apply different hyperparameters to different regions of the input space. In this paper, localized SVMs are analyzed with regards to their consistency. It is proven that they inherit $L_p$- as well as risk consistency from global SVMs under very weak conditions and even if the regions underlying the localized SVMs are allowed to change as the size of the training data set increases.
On the Connection between $L_p$ and Risk Consistency and its Implications on Regularized Kernel Methods
As a predictor's quality is often assessed by means of its risk, it is natural to regard risk consistency as a desirable property of learning methods, and many such methods have indeed been shown to be risk consistent. The first aim of this paper is to establish the close connection between risk consistency and $L_p$-consistency for a considerably wider class of loss functions than has been done before. The attempt to transfer this connection to shifted loss functions surprisingly reveals that this shift does not reduce the assumptions needed on the underlying probability measure to the same extent as it does for many other results. The results are applied to regularized kernel methods such as support vector machines.
Total Stability of SVMs and Localized SVMs
Köhler, Hannes, Christmann, Andreas
Regularized kernel-based methods such as support vector machines (SVMs) typically depend on the underlying probability measure $\mathrm{P}$ (respectively an empirical measure $\mathrm{D}_n$ in applications) as well as on the regularization parameter $\lambda$ and the kernel $k$. Whereas classical statistical robustness only considers the effect of small perturbations in $\mathrm{P}$, the present paper investigates the influence of simultaneous slight variations in the whole triple $(\mathrm{P},\lambda,k)$, respectively $(\mathrm{D}_n,\lambda_n,k)$, on the resulting predictor. Existing results from the literature are considerably generalized and improved. In order to also make them applicable to big data, where regular SVMs suffer from their super-linear computational requirements, we show how our results can be transferred to the context of localized learning. Here, the effect of slight variations in the applied regionalization, which might for example stem from changes in $\mathrm{P}$ respectively $\mathrm{D}_n$, is considered as well.
On the robustness of kernel-based pairwise learning
Gensler, Patrick, Christmann, Andreas
It is shown that many results on the statistical robustness of kernel-based pairwise learning can be derived under basically no assumptions on the input and output spaces. In particular neither moment conditions on the conditional distribution of Y given X = x nor the boundedness of the output space is needed. We obtain results on the existence and boundedness of the influence function and show qualitative robustness of the kernel-based estimator. The present paper generalizes results by Christmann and Zhou (2016) by allowing the prediction function to take two arguments and can thus be applied in a variety of situations such as ranking.
Quantitative Robustness of Localized Support Vector Machines
There are many general introductions to these methods from the view of computer science and statistics. Summarizing textbooks are for example Cristianini & Shawe-Taylor (2000), Schölkopf & Smola (2001), Cucker & Zhou (2007), or Steinwart & Christmann (2008). These methods became pretty popular in many fields of science, see for example Ma & Guo (2014). The analysis provided by this paper refers to supervised learning, i. e. to classification or regression problems. Beyond this, support vector machines are a suitable method for unsupervised learning (e. g. novelty detection), too. The paper can be seen as a sequel to Dumpert & Christmann (2018) where universal consistency and robustness with respect to the maxbias of localized support vector machines have already been shown. This paper is dedicated to refine the robustness analysis. It is organized as follows: Section 2.1 gives a short overview on support vector machines, Section 2.2 introduces shortly the idea of local approaches. The results concerning the influence function of localized support vector machines are given in Section 3. Section 4 finally summarizes the paper.
Total stability of kernel methods
Christmann, Andreas, Xiang, Daohong, Zhou, Ding-Xuan
Regularized empirical risk minimization using kernels and their corresponding reproducing kernel Hilbert spaces (RKHSs) plays an important role in machine learning. However, the actually used kernel often depends on one or on a few hyperparameters or the kernel is even data dependent in a much more complicated manner. Examples are Gaussian RBF kernels, kernel learning, and hierarchical Gaussian kernels which were recently proposed for deep learning. Therefore, the actually used kernel is often computed by a grid search or in an iterative manner and can often only be considered as an approximation to the "ideal" or "optimal" kernel. The paper gives conditions under which classical kernel based methods based on a convex Lipschitz loss function and on a bounded and smooth kernel are stable, if the probability measure $P$, the regularization parameter $\lambda$, and the kernel $k$ may slightly change in a simultaneous manner. Similar results are also given for pairwise learning. Therefore, the topic of this paper is somewhat more general than in classical robust statistics, where usually only the influence of small perturbations of the probability measure $P$ on the estimated function is considered.
Universal Consistency and Robustness of Localized Support Vector Machines
This paper analyses properties of localized kernel based, nonparametric statistical machine learning methods, in particular of support vector machines (SVMs) and methods close to them. Caused by the enormous research activities there is abundance of general introductions to this field of computer science and statistics. Beside many publications in international journals there are summarizing textbooks like for example Cristianini & Shawe-Taylor (2000), Schölkopf & Smola (2001), Steinwart & Christmann (2008) or Cucker & Zhou (2007) from a mathematical or statistical point of view. Nevertheless, we want to give a short overview over the analyzed topic. Support vector machines were initially introduced by Boser, Guyon & Vapnik (1992) und Cortes & Vapnik (1995), based on earlier work like the Russian original of Vapnik, Chervonenkis & Červonenkis (1979).
On the stability of bootstrap estimators
Christmann, Andreas, Salibian-Barrera, Matias, Van Aelst, Stefan
It is shown that bootstrap approximations of an estimator which is based on a continuous operator from the set of Borel probability measures defined on a compact metric space into a complete separable metric space is stable in the sense of qualitative robustness. Support vector machines based on shifted loss functions are treated as special cases.