Optimal Rates for Learning with Nystr\"om Stochastic Gradient Methods

Oct-21-2017–arXiv.org Machine Learning

In supervised learning, given a sample of n pairs of inputs and outputs, the goal is to estimate a function to be used to predict future outputs based on observing only the corresponding inputs. The quality of an estimate is often measured in terms of the mean-squared prediction error, in which case the regression function is optimal. Since the properties of the function to be estimated are not known a priori, nonparametric techniques, that can adapt their complexity to the problem at hand, are often key to good results. Kernel methods [15, 36] are probably the most common nonparametric approaches to learning. They are based on choosing a reproducing kernel Hilbert space (RKHS) as the hypothesis space in the design of learning algorithms. A classical learning algorithm using kernel methods to perform learning tasks is kernel ridge regression (KRR), which is based on minimizing the sum of a data-fitting term and an explicit penalty term. The penalty term is used for regularization, and controls the complexity of the solution, preventing overfitting. The statistical properties of KRR have been studied extensively, see e.g.

artificial intelligence, lemma 5, machine learning, (18 more...)

arXiv.org Machine Learning

Oct-21-2017

arXiv.org PDF

Add feedback

Country:
- North America > United States > Massachusetts (0.28)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning > Gradient Descent (0.50)
  - Performance Analysis > Accuracy (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found