Supervised Models Can Generalize Also When Trained on Random Labels
Allerbo, Oskar, Schön, Thomas B.
The success of unsupervised learning raises the question of whether also supervised models can be trained without using the information in the output $y$. In this paper, we demonstrate that this is indeed possible. The key step is to formulate the model as a smoother, i.e. on the form $\hat{f}=Sy$, and to construct the smoother matrix $S$ independently of $y$, e.g. by training on random labels. We present a simple model selection criterion based on the distribution of the out-of-sample predictions and show that, in contrast to cross-validation, this criterion can be used also without access to $y$. We demonstrate on real and synthetic data that $y$-free trained versions of linear and kernel ridge regression, smoothing splines, and neural networks perform similarly to their standard, $y$-based, versions and, most importantly, significantly better than random guessing.
May-23-2025
- Country:
- Asia > Middle East
- Jordan (0.04)
- Europe > Sweden
- Uppsala County > Uppsala (0.04)
- North America
- Canada > Ontario
- Toronto (0.04)
- United States
- California (0.04)
- Kansas > Riley County
- Manhattan (0.04)
- Canada > Ontario
- Asia > Middle East
- Genre:
- Instructional Material > Course Syllabus & Notes (0.46)
- Research Report (0.64)
- Industry:
- Energy (0.92)
- Technology:
- Information Technology > Artificial Intelligence > Machine Learning
- Neural Networks (1.00)
- Performance Analysis > Accuracy (0.34)
- Statistical Learning
- Clustering (0.46)
- Regression (0.46)
- Information Technology > Artificial Intelligence > Machine Learning