Reviews: KDGAN: Knowledge Distillation with Generative Adversarial Networks

Neural Information Processing Systems 

In this paper, the authors propose combining a knowledge distillation and GANs to improve the accuracy for multi-class classification. At the core, they demonstrate that combining these two approaches provides a better balance of sample efficiency and convergence to the ground truth distribution for improved accuracy. They claim two primary technical innovations (beyond combining these two approaches): using the Gumbel-Max trick for differentiability and having the classifier supervise the teacher (not just the teacher supervise the classifier). They argue that the improvements come from lower variance gradients and that the equilibrium of the minimax game is convergence to the true label distribution. The idea of combining these two perspectives is interesting, and both the theoretical arguments and the empirical results are compelling.