New Equivalences Between Interpolation and SVMs: Kernels and Structured Features

Kaushik, Chiraag, McRae, Andrew D., Davenport, Mark A., Muthukumar, Vidya

arXiv.org Artificial Intelligence 

Recent empirical and theoretical efforts in supervised machine learning have discovered a wide range of surprising phenomena that arise in the modern overparameterized regime (i.e., where the number of free parameters in the model is much larger than the number of training examples [13, 6]). For example, after it was observed that deep neural networks can perfectly fit noisy training data and still generalise well to new data (see, e.g., [35, 43]), several theoretical efforts have demonstrated that this "harmless interpolation" phenomenon can in fact occur even in the simpler settings of linear and kernel regression[8, 7, 5]. Aseparate, but equally surprising observation in this overparameterized regime is that training procedures that optimize different loss functions can still yield similar test performance. For example, the empirical studies of [36, 22, 26, 16] demonstrate that kernel machines and deep neural networks trained using the squared loss, which is traditionally reserved for regression problems with continuous labels, can result in comparable classification performance to those trained with the more popular cross-entropy loss. Motivated by these observations, recent work has sought to deepen theoretical understanding of the impact of the loss function in overparameterized classification tasks, starting with linear models.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found