SLaM: Student-Label Mixing for Distillation with Unlabeled Examples

Apr-29-2026, 22:04:00 GMT–Neural Information Processing Systems

Knowledge distillation with unlabeled examples is a powerful training paradigm for generating compact and lightweight student models in applications where the amount of labeled data is limited but one has access to a large pool of unlabeled data. In this setting, a large teacher model generates "soft" pseudo-labels for the unlabeled dataset which are then used for training the student model. Despite its success in a wide variety of applications, a shortcoming of this approach is that the teacher's pseudo-labels are often noisy, leading to impaired student performance. In this paper, we present a principled method for knowledge distillation with unlabeled examples that we call Student-Label Mixing (SLaM) and we show that it consistently improves over prior approaches by evaluating it on several standard benchmarks. Finally, we show that SLaM comes with theoretical guarantees; along the way we give an algorithm improving the best-known sample complexity for learning halfspaces with margin under random classification noise, and provide the first convergence analysis for so-called "forward loss-adjustment" methods.

artificial intelligence, dataset, machine learning, (16 more...)

Neural Information Processing Systems

Apr-29-2026, 22:04:00 GMT

Conferences PDF

Add feedback

Country:
- North America > United States (0.28)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Education (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning (0.93)
  - Neural Networks (0.68)

Duplicate Docs Excel Report

Title
d56b84c063265da949fe0feb815dcce8-Paper-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found