AITopics | Statistical Learning

SLaM: Student-Label Mixing for Distillation with Unlabeled Examples

Neural Information Processing SystemsApr-29-2026, 22:04:00 GMT

Knowledge distillation with unlabeled examples is a powerful training paradigm for generating compact and lightweight student models in applications where the amount of labeled data is limited but one has access to a large pool of unlabeled data. In this setting, a large teacher model generates "soft" pseudo-labels for the unlabeled dataset which are then used for training the student model. Despite its success in a wide variety of applications, a shortcoming of this approach is that the teacher's pseudo-labels are often noisy, leading to impaired student performance. In this paper, we present a principled method for knowledge distillation with unlabeled examples that we call Student-Label Mixing (SLaM) and we show that it consistently improves over prior approaches by evaluating it on several standard benchmarks. Finally, we show that SLaM comes with theoretical guarantees; along the way we give an algorithm improving the best-known sample complexity for learning halfspaces with margin under random classification noise, and provide the first convergence analysis for so-called "forward loss-adjustment" methods.

artificial intelligence, dataset, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

d5470483dd38f71f7bd9e68ce1b94145-Paper-Conference.pdf

Neural Information Processing SystemsApr-29-2026, 21:51:36 GMT

artificial intelligence, machine learning, representation, (12 more...)

Neural Information Processing Systems

Country: Asia (0.14)

Genre: Research Report (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

d4eed238cf5807c6b75face996302892-Paper-Conference.pdf

Neural Information Processing SystemsApr-29-2026, 21:50:09 GMT

machine learning, natural language, segmentation, (18 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.68)
(2 more...)

Add feedback

Selective Sampling and Imitation Learning via Online Regression

Neural Information Processing SystemsApr-29-2026, 21:36:39 GMT

We consider the problem of Imitation Learning (IL) by actively querying noisy expert for feedback. While imitation learning has been empirically successful, much of prior work assumes access to noiseless expert feedback which is not practical in many applications. In fact, when one only has access to noisy expert feedback, algorithms that rely on purely offline data (non-interactive IL) can be shown to need a prohibitively large number of samples to be successful. In contrast, in this work, we provide an interactive algorithm for IL that uses selective sampling to actively query the noisy expert for feedback. Our contributions are twofold: First, we provide a new selective sampling algorithm that works with general function classes and multiple actions, and obtains the best-known bounds for the regret and the number of queries.

algorithm, artificial intelligence, machine learning, (14 more...)

Neural Information Processing Systems

Genre: Instructional Material > Course Syllabus & Notes (0.45)

Industry:

Education (0.67)
Transportation (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)

Add feedback

d43621ff2dfe39d298dcd4a41937c912-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsApr-29-2026, 21:35:17 GMT

machine learning, natural language, sketch, (17 more...)

Neural Information Processing Systems

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(4 more...)

Add feedback

d37c9ad425fe5b65304d500c6edcba00-Paper-Conference.pdf

Neural Information Processing SystemsApr-29-2026, 21:21:07 GMT

artificial intelligence, equilibrium, machine learning, (19 more...)

Neural Information Processing Systems

Country: Europe (0.46)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Random Cuts are Optimal for Explainable k-Medians

Neural Information Processing SystemsApr-29-2026, 21:19:39 GMT

We show that the RANDOMCOORDINATECUT algorithm gives the optimal competitive ratio for explainable k-medians in ℓ1. The problem of explainable k-medians was introduced by Dasgupta, Frost, Moshkovitz, and Rashtchian in 2020. Several groups of authors independently proposed a simple polynomial-time randomized algorithm for the problem and showed that this algorithm is O(logkloglogk) competitive. We provide a tight analysis of the algorithm and prove that its competitive ratio is upper bounded by 2lnk +2. This bound matches the Ω(logk)lower bound by Dasgupta et al (2020).

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

d3408794e41dd23e34634344d662f5e9-Paper-Conference.pdf

Neural Information Processing SystemsApr-29-2026, 21:19:36 GMT

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Particle-based Variational Inference with Generalized Wasserstein Gradient Flow

Neural Information Processing SystemsApr-29-2026, 20:52:22 GMT

Particle-based variational inference methods (ParVIs) such as Stein variational gradient descent (SVGD) update the particles based on the kernelized Wasserstein gradient flow for the Kullback-Leibler (KL) divergence. However, the design of kernels is often non-trivial and can be restrictive for the flexibility of the method. Recent works show that functional gradient flow approximations with quadratic form regularization terms can improve performance. In this paper, we propose a ParVI framework, called generalized Wasserstein gradient descent (GWG), based on a generalized Wasserstein gradient flow of the KL divergence, which can be viewed as a functional gradient method with a broader class of regularizers induced by convex functions. We show that GWG exhibits strong convergence guarantees. We also provide an adaptive version that automatically chooses Wasserstein metric to accelerate convergence. In experiments, we demonstrate the effectiveness and efficiency of the proposed framework on both simulated and real data problems.

artificial intelligence, inequality, machine learning, (17 more...)

Neural Information Processing Systems

Country: Asia (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Particle-based Variational Inference with Generalized Wasserstein Gradient Flow

Neural Information Processing SystemsApr-29-2026, 20:52:18 GMT

Particle-based variational inference methods (ParVIs) such as Stein variational gradient descent (SVGD) update the particles based on the kernelized Wasserstein gradient flow for the Kullback-Leibler (KL) divergence. However, the design of kernels is often non-trivial and can be restrictive for the flexibility of the method. Recent works show that functional gradient flow approximations with quadratic form regularization terms can improve performance. In this paper, we propose a ParVI framework, called generalized Wasserstein gradient descent (GWG), based on a generalized Wasserstein gradient flow of the KL divergence, which can be viewed as a functional gradient method with a broader class of regularizers induced by convex functions. We show that GWG exhibits strong convergence guarantees. We also provide an adaptive version that automatically chooses Wasserstein metric to accelerate convergence. In experiments, we demonstrate the effectiveness and efficiency of the proposed framework on both simulated and real data problems.

artificial intelligence, inequality, machine learning, (17 more...)

Neural Information Processing Systems

Country: Asia (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Filters

Collaborating Authors

Statistical Learning

SLaM: Student-Label Mixing for Distillation with Unlabeled Examples

d5470483dd38f71f7bd9e68ce1b94145-Paper-Conference.pdf

d4eed238cf5807c6b75face996302892-Paper-Conference.pdf

Selective Sampling and Imitation Learning via Online Regression

d43621ff2dfe39d298dcd4a41937c912-Paper-Datasets_and_Benchmarks.pdf

d37c9ad425fe5b65304d500c6edcba00-Paper-Conference.pdf

Random Cuts are Optimal for Explainable k-Medians

d3408794e41dd23e34634344d662f5e9-Paper-Conference.pdf

Particle-based Variational Inference with Generalized Wasserstein Gradient Flow

Particle-based Variational Inference with Generalized Wasserstein Gradient Flow