Optimal Rates for Learning with Nystr\"om Stochastic Gradient Methods
Lin, Junhong, Rosasco, Lorenzo
In supervised learning, given a sample of n pairs of inputs and outputs, the goal is to estimate a function to be used to predict future outputs based on observing only the corresponding inputs. The quality of an estimate is often measured in terms of the mean-squared prediction error, in which case the regression function is optimal. Since the properties of the function to be estimated are not known a priori, nonparametric techniques, that can adapt their complexity to the problem at hand, are often key to good results. Kernel methods [15, 36] are probably the most common nonparametric approaches to learning. They are based on choosing a reproducing kernel Hilbert space (RKHS) as the hypothesis space in the design of learning algorithms. A classical learning algorithm using kernel methods to perform learning tasks is kernel ridge regression (KRR), which is based on minimizing the sum of a data-fitting term and an explicit penalty term. The penalty term is used for regularization, and controls the complexity of the solution, preventing overfitting. The statistical properties of KRR have been studied extensively, see e.g.
Oct-21-2017