Goto

Collaborating Authors

 subsection


From Linear to Nonlinear: Provable Weak-to-Strong Generalization through Feature Learning

Neural Information Processing Systems

Weak-to-strong generalization refers to the phenomenon where a stronger model trained under supervision from a weaker one can outperform its teacher. While prior studies aim to explain this effect, most theoretical insights are limited to abstract frameworks or linear/random feature models. In this paper, we provide a formal analysis of weak-to-strong generalization from a linear CNN (weak) to a two-layer ReLUCNN (strong). We consider structured data composed of labeldependent signals of varying difficulty and label-independent noise, and analyze gradient descent dynamics when the strong model is trained on data labeled by the pretrained weak model. Our analysis identifies two regimes--data-scarce and data-abundant--based on the signal-to-noise characteristics of the dataset, and reveals distinct mechanisms of weak-to-strong generalization. In the datascarce regime, generalization occurs via benign overfitting or fails via harmful overfitting, depending on the amount of data, and we characterize the transition boundary. In the data-abundant regime, generalization emerges in the early phase through label correction, but we observe that overtraining can subsequently degrade performance.






A Meta-Analysis of Overfitting in Machine Learning

Neural Information Processing Systems

In each competition, numerous practitioners repeatedly evaluated their progress against a holdout set that forms the basis of a public ranking availablethroughout the competition. Performance on a separate test set used only oncedetermined the final ranking.


730ce0ae730f39e4d77b0f04a8afe4be-Supplemental-Conference.pdf

Neural Information Processing Systems

This paper studies the use of a machine learning-based estimator as a control variate for mitigating the variance of Monte Carlo sampling. Specifically, we seek to uncover the key factors that influence the efficiency of control variates in reducing variance.



fa93d7bfb48450e1af63c8fa647d317f-Paper-Conference.pdf

Neural Information Processing Systems

Tothebestofourknowledge, ourenhanced latent space blind model, optimization scheme, NFAEandFM2A havenot been reported in the previous literature.