AITopics | kdgan

KDGAN: Knowledge Distillation with Generative Adversarial Networks

Neural Information Processing SystemsMar-16-2026, 17:26:02 GMT

Knowledge distillation (KD) aims to train a lightweight classifier suitable to provide accurate inference with constrained resources in multi-label learning. Instead of directly consuming feature-label pairs, the classifier is trained by a teacher, i.e., a high-capacity model whose training may be resource-hungry. The accuracy of the classifier trained this way is usually suboptimal because it is difficult to learn the true data distribution from the teacher. An alternative method is to adversarially train the classifier against a discriminator in a two-player game akin to generative adversarial networks (GAN), which can ensure the classifier to learn the true data distribution at the equilibrium of this game. However, it may take excessively long time for such a two-player game to reach equilibrium due to high-variance gradient updates.

artificial intelligence, classifier, machine learning, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

KDGAN: Knowledge Distillation with Generative Adversarial Networks

Xiaojie Wang, Rui Zhang, Yu Sun, Jianzhong Qi

Neural Information Processing SystemsFeb-12-2026, 02:51:55 GMT

Theaccuracyofthe classifier trained thiswayisusually suboptimal because itisdifficulttolearn the true data distribution from the teacher.

artificial intelligence, kdgan, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > Poland (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

KDGAN: Knowledge Distillation with Generative Adversarial Networks

Neural Information Processing SystemsNov-20-2025, 21:41:04 GMT

Knowledge distillation (KD) aims to train a lightweight classifier suitable to provide accurate inference with constrained resources in multi-label learning. Instead of directly consuming feature-label pairs, the classifier is trained by a teacher, i.e., a high-capacity model whose training may be resource-hungry. The accuracy of the classifier trained this way is usually suboptimal because it is difficult to learn the true data distribution from the teacher. An alternative method is to adversarially train the classifier against a discriminator in a two-player game akin to generative adversarial networks (GAN), which can ensure the classifier to learn the true data distribution at the equilibrium of this game. However, it may take excessively long time for such a two-player game to reach equilibrium due to high-variance gradient updates.

classifier, generative adversarial network, knowledge distillation, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

KDGAN: Knowledge Distillation with Generative Adversarial Networks

Xiaojie Wang, Rui Zhang, Yu Sun, Jianzhong Qi

Neural Information Processing SystemsNov-20-2025, 13:54:59 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, kdgan, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > California > San Diego County > San Diego (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Poland (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Reviews: KDGAN: Knowledge Distillation with Generative Adversarial Networks

Neural Information Processing SystemsOct-7-2024, 03:43:09 GMT

In this paper, the authors propose combining a knowledge distillation and GANs to improve the accuracy for multi-class classification. At the core, they demonstrate that combining these two approaches provides a better balance of sample efficiency and convergence to the ground truth distribution for improved accuracy. They claim two primary technical innovations (beyond combining these two approaches): using the Gumbel-Max trick for differentiability and having the classifier supervise the teacher (not just the teacher supervise the classifier). They argue that the improvements come from lower variance gradients and that the equilibrium of the minimax game is convergence to the true label distribution. The idea of combining these two perspectives is interesting, and both the theoretical arguments and the empirical results are compelling.

classifier, generative adversarial network, knowledge distillation, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

KDGAN: Knowledge Distillation with Generative Adversarial Networks

Wang, Xiaojie, Zhang, Rui, Sun, Yu, Qi, Jianzhong

Neural Information Processing SystemsFeb-14-2020, 06:28:12 GMT

Knowledge distillation (KD) aims to train a lightweight classifier suitable to provide accurate inference with constrained resources in multi-label learning. Instead of directly consuming feature-label pairs, the classifier is trained by a teacher, i.e., a high-capacity model whose training may be resource-hungry. The accuracy of the classifier trained this way is usually suboptimal because it is difficult to learn the true data distribution from the teacher. An alternative method is to adversarially train the classifier against a discriminator in a two-player game akin to generative adversarial networks (GAN), which can ensure the classifier to learn the true data distribution at the equilibrium of this game. However, it may take excessively long time for such a two-player game to reach equilibrium due to high-variance gradient updates.

classifier, generative adversarial network, knowledge distillation, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.64)

Add feedback

KDGAN: Knowledge Distillation with Generative Adversarial Networks

Wang, Xiaojie, Zhang, Rui, Sun, Yu, Qi, Jianzhong

Neural Information Processing SystemsDec-31-2018

Knowledge distillation (KD) aims to train a lightweight classifier suitable to provide accurate inference with constrained resources in multi-label learning. Instead of directly consuming feature-label pairs, the classifier is trained by a teacher, i.e., a high-capacity model whose training may be resource-hungry. The accuracy of the classifier trained this way is usually suboptimal because it is difficult to learn the true data distribution from the teacher. An alternative method is to adversarially train the classifier against a discriminator in a two-player game akin to generative adversarial networks (GAN), which can ensure the classifier to learn the true data distribution at the equilibrium of this game. However, it may take excessively long time for such a two-player game to reach equilibrium due to high-variance gradient updates. To address these limitations, we propose a three-player game named KDGAN consisting of a classifier, a teacher, and a discriminator. The classifier and the teacher learn from each other via distillation losses and are adversarially trained against the discriminator via adversarial losses. By simultaneously optimizing the distillation and adversarial losses, the classifier will learn the true data distribution at the equilibrium. We approximate the discrete distribution learned by the classifier (or the teacher) with a concrete distribution. From the concrete distribution, we generate continuous samples to obtain low-variance gradient updates, which speed up the training. Extensive experiments using real datasets confirm the superiority of KDGAN in both accuracy and training speed.

artificial intelligence, kdgan, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.46)
North America > United States (0.28)

Industry: Leisure & Entertainment (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)

Add feedback

KDGAN: Knowledge Distillation with Generative Adversarial Networks

Wang, Xiaojie, Zhang, Rui, Sun, Yu, Qi, Jianzhong

Neural Information Processing SystemsDec-31-2018

Knowledge distillation (KD) aims to train a lightweight classifier suitable to provide accurate inference with constrained resources in multi-label learning. Instead of directly consuming feature-label pairs, the classifier is trained by a teacher, i.e., a high-capacity model whose training may be resource-hungry. The accuracy of the classifier trained this way is usually suboptimal because it is difficult to learn the true data distribution from the teacher. An alternative method is to adversarially train the classifier against a discriminator in a two-player game akin to generative adversarial networks (GAN), which can ensure the classifier to learn the true data distribution at the equilibrium of this game. However, it may take excessively long time for such a two-player game to reach equilibrium due to high-variance gradient updates. To address these limitations, we propose a three-player game named KDGAN consisting of a classifier, a teacher, and a discriminator. The classifier and the teacher learn from each other via distillation losses and are adversarially trained against the discriminator via adversarial losses. By simultaneously optimizing the distillation and adversarial losses, the classifier will learn the true data distribution at the equilibrium. We approximate the discrete distribution learned by the classifier (or the teacher) with a concrete distribution. From the concrete distribution, we generate continuous samples to obtain low-variance gradient updates, which speed up the training. Extensive experiments using real datasets confirm the superiority of KDGAN in both accuracy and training speed.

artificial intelligence, kdgan, machine learning, (15 more...)

Neural Information Processing Systems

Country: