Goto

Collaborating Authors

 kernel alignment risk estimator





Review for NeurIPS paper: Kernel Alignment Risk Estimator: Risk Prediction from Training Data

Neural Information Processing Systems

Weaknesses: I have read the rebuttal and other reviewers' comments. The rebuttal addressed my questions and thus I increased the score to 7. Nevertheless, I suggest the authors carefully organize the paper for clarity in the final version. The quality of this paper is good, but there are some issues that I concerned: Motivation and related works - The Gaussianity assumption: This paper considers the gaussian data, that follows with previous works in RMT, e.g., [17,20]. Also the following refs forcuses on the risk convergence of (centered) KRR by RMT in an asymptotic regime. It's ok, but in this case, the motivation in the introduction should be better presented when compared to current results in RMT.


Kernel Alignment Risk Estimator: Risk Prediction from Training Data

Neural Information Processing Systems

We study the risk (i.e. For this, we introduce two objects: the Signal Capture Threshold (SCT) and the Kernel Alignment Risk Estimator (KARE). The SCT \vartheta_{K,\lambda} is a function of the data distribution: it can be used to identify the components of the data that the KRR predictor captures, and to approximate the (expected) KRR risk. This then leads to a KRR risk approximation by the KARE \rho_{K, \lambda}, an explicit function of the training data, agnostic of the true data distribution. The key results then follow from a finite-size adaptation of the resolvent method for general Wishart random matrices.


Kernel Alignment Risk Estimator: Risk Prediction from Training Data

Jacot, Arthur, Şimşek, Berfin, Spadaro, Francesco, Hongler, Clément, Gabriel, Franck

arXiv.org Machine Learning

We study the risk (i.e. generalization error) of Kernel Ridge Regression (KRR) for a kernel $K$ with ridge $\lambda>0$ and i.i.d. observations. For this, we introduce two objects: the Signal Capture Threshold (SCT) and the Kernel Alignment Risk Estimator (KARE). The SCT $\vartheta_{K,\lambda}$ is a function of the data distribution: it can be used to identify the components of the data that the KRR predictor captures, and to approximate the (expected) KRR risk. This then leads to a KRR risk approximation by the KARE $\rho_{K, \lambda}$, an explicit function of the training data, agnostic of the true data distribution. We phrase the regression problem in a functional setting. The key results then follow from a finite-size analysis of the Stieltjes transform of general Wishart random matrices. Under a natural universality assumption (that the KRR moments depend asymptotically on the first two moments of the observations) we capture the mean and variance of the KRR predictor. We numerically investigate our findings on the Higgs and MNIST datasets for various classical kernels: the KARE gives an excellent approximation of the risk, thus supporting our universality assumption. Using the KARE, one can compare choices of Kernels and hyperparameters directly from the training set. The KARE thus provides a promising data-dependent procedure to select Kernels that generalize well.