Flat Minima in Linear Estimation and an Extended Gauss Markov Theorem

Segert, Simon

arXiv.org Machine Learning 

We consider the problem of linear estimation, and establish an extension of the Gauss-Markov theorem, in which the bias operator is allowed to be non-zero but bounded with respect to a matrix norm of Schatten type. We derive simple and explicit formulas for the optimal estimator in the cases of Nuclear and Spectral norms (with the Frobenius case recovering ridge regression). Additionally, we analytically derive the generalization error in multiple random matrix ensembles, and compare with Ridge regression. Finally, we conduct an extensive simulation study, in which we show that the cross-validated Nuclear and Spectral regressors can outperform Ridge in several circumstances. Linear models are among the most used of all machine-learning models in applications to science and engineering. In addition to this practical interest, they are also of great theoretical interest. Indeed, the empirical successes of neural networks have proven somewhat at odds with traditional received wisdom from classical statistical learning theory(Belkin et al., 2019). This has begun to be reconciled by careful analysis of well-chosen linear "model systems" such as high-dimensional regression (Hastie et al., 2019), kernel ridge regression (Canatar et al., 2021; Jacot et al., 2020), or linear neural networks (Saxe et al., 2013), which can reproduce many qualitative properties of learning dynamics of neural networks. Thus, the careful study of linear models, especially in the limit of a large number of features and observations can prove highly valuable for qualitative understanding of non-linear models.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found