Goto

Collaborating Authors

Data-driven calibration of linear estimators with minimal penalties

arXiv.org Machine Learning

This paper tackles the problem of selecting among several linear estimators in non-parametric regression; this includes model selection for linear regression, the choice of a regularization parameter in kernel ridge regression, spline smoothing or locally weighted regression, and the choice of a kernel in multiple kernel learning. We propose a new algorithm which first estimates consistently the variance of the noise, based upon the concept of minimal penalty, which was previously introduced in the context of model selection. Then, plugging our variance estimate in Mallows' $C_L$ penalty is proved to lead to an algorithm satisfying an oracle inequality. Simulation experiments with kernel ridge regression and multiple kernel learning show that the proposed algorithm often improves significantly existing calibration procedures such as generalized cross-validation.


Singular ridge regression with homoscedastic residuals: generalization error with estimated parameters

arXiv.org Machine Learning

This paper characterizes the conditional distribution properties of the finite sample ridge regression estimator and uses that result to evaluate total regression and generalization errors that incorporate the inaccuracies committed at the time of parameter estimation. The paper provides explicit formulas for those errors. Unlike other classical references in this setup, our results take place in a fully singular setup that does not assume the existence of a solution for the non-regularized regression problem. In exchange, we invoke a conditional homoscedasticity hypothesis on the regularized regression residuals that is crucial in our developments.


Optimal tuning for divide-and-conquer kernel ridge regression with massive data

arXiv.org Machine Learning

We propose a first data-driven tuning procedure for divide-and-conquer kernel ridge regression (Zhang et al., 2015). While the proposed criterion is computationally scalable for massive data sets, it is also shown to be asymptotically optimal under mild conditions. The effectiveness of our method is illustrated by extensive simulations and an application to Million Song Dataset.


Tuning-free ridge estimators for high-dimensional generalized linear models

arXiv.org Machine Learning

Ridge estimators regularize the squared Euclidean lengths of parameters. Such estimators are mathematically and computationally attractive but involve tuning parameters that can be difficult to calibrate. In this paper, we show that ridge estimators can be modified such that tuning parameters can be avoided altogether. We also show that these modified versions can improve on the empirical prediction accuracies of standard ridge estimators combined with cross-validation, and we provide first theoretical guarantees.


Implicit ridge regularization provided by the minimum-norm least squares estimator when $n\ll p$

arXiv.org Machine Learning

A conventional wisdom in statistical learning is that large models require strong regularization to prevent overfitting. This rule has been recently challenged by deep neural networks: despite being expressive enough to fit any training set perfectly, they still generalize well. Here we show that the same is true for linear regression in the under-determined $n\ll p$ situation, provided that one uses the minimum-norm estimator. The case of linear model with least squares loss allows full and exact mathematical analysis. We prove that augmenting a model with many random covariates with small constant variance and using minimum-norm estimator is asymptotically equivalent to adding the ridge penalty. Using toy example simulations as well as real-life high-dimensional data sets, we demonstrate that explicit ridge penalty often fails to provide any improvement over this implicit ridge regularization. In this regime, minimum-norm estimator achieves zero training error but nevertheless has low expected error.