Supervised Models Can Generalize Also When Trained on Random Labels

May-23-2025–arXiv.org Machine Learning

The success of unsupervised learning raises the question of whether also supervised models can be trained without using the information in the output $y$. In this paper, we demonstrate that this is indeed possible. The key step is to formulate the model as a smoother, i.e. on the form $\hat{f}=Sy$, and to construct the smoother matrix $S$ independently of $y$, e.g. by training on random labels. We present a simple model selection criterion based on the distribution of the out-of-sample predictions and show that, in contrast to cross-validation, this criterion can be used also without access to $y$. We demonstrate on real and synthetic data that $y$-free trained versions of linear and kernel ridge regression, smoothing splines, and neural networks perform similarly to their standard, $y$-based, versions and, most importantly, significantly better than random guessing.

artificial intelligence, machine learning, neural network, (12 more...)

arXiv.org Machine Learning

May-23-2025

arXiv.org PDF

Add feedback

Country:
- North America
  - United States
    - California (0.04)
    - Kansas > Riley County
      - Manhattan (0.04)
  - Canada > Ontario
    - Toronto (0.04)
- Europe > Sweden
  - Uppsala County > Uppsala (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report (0.64)
- Instructional Material > Course Syllabus & Notes (0.46)

Industry:
- Energy (0.92)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks (1.00)
  - Performance Analysis > Accuracy (0.34)
  - Statistical Learning
    - Regression (0.46)
    - Clustering (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found