Goto

Collaborating Authors

 zero-shot knowledge transfer


Zero-shot Knowledge Transfer via Adversarial Belief Matching

Neural Information Processing Systems

Performing knowledge transfer from a large teacher network to a smaller student is a popular task in modern deep learning applications. However, due to growing dataset sizes and stricter privacy regulations, it is increasingly common not to have access to the data that was used to train the teacher. We propose a novel method which trains a student to match the predictions of its teacher without using any data or metadata. We achieve this by training an adversarial generator to search for images on which the student poorly matches the teacher, and then using them to train the student. Our resulting student closely approximates its teacher for simple datasets like SVHN, and on CIFAR10 we improve on the state-of-the-art for few-shot distillation (with $100$ images per class), despite using no data. Finally, we also propose a metric to quantify the degree of belief matching between teacher and student in the vicinity of decision boundaries, and observe a significantly higher match between our zero-shot student and the teacher, than between a student distilled with real data and the teacher.


Reviews: Zero-shot Knowledge Transfer via Adversarial Belief Matching

Neural Information Processing Systems

While I am only guessing that performance may degrade as a function of dataset scale, it is not hard to imagine advances in GANs which could make that degradation smaller, hence make the proposed method more useful. Further, even in an adversarial setting, it may be possible to guess what kind of inputs are relevant, or extend the method to few-shot or some hybrid approach. I am positively surprised that features of the student have comparable transferability to the teacher, I was concerned that some sort of overfitting to a teacher's decision boundary was possible, but this does not seem to be the case. While I agree with the authors that, in most cases, those releasing research models will not go out of their way to vaccinate them against zero-shot distillation, the proposed method could be used to (somewhat) copy and repurpose information stored in hardware model. Take for example Tesla's autopilot which uses several neural networks in it and is trained on tens of billions of images which are not available to the world.


Reviews: Zero-shot Knowledge Transfer via Adversarial Belief Matching

Neural Information Processing Systems

All reviewers appreciated the work and recommend acceptance. The authors are encouraged to address the reviewers comments and include the author response in the camera-ready version of the manuscript.


Zero-shot Knowledge Transfer via Adversarial Belief Matching

Neural Information Processing Systems

Performing knowledge transfer from a large teacher network to a smaller student is a popular task in modern deep learning applications. However, due to growing dataset sizes and stricter privacy regulations, it is increasingly common not to have access to the data that was used to train the teacher. We propose a novel method which trains a student to match the predictions of its teacher without using any data or metadata. We achieve this by training an adversarial generator to search for images on which the student poorly matches the teacher, and then using them to train the student. Our resulting student closely approximates its teacher for simple datasets like SVHN, and on CIFAR10 we improve on the state-of-the-art for few-shot distillation (with 100 images per class), despite using no data. Finally, we also propose a metric to quantify the degree of belief matching between teacher and student in the vicinity of decision boundaries, and observe a significantly higher match between our zero-shot student and the teacher, than between a student distilled with real data and the teacher.


Zero-shot Knowledge Transfer via Adversarial Belief Matching

Micaelli, Paul, Storkey, Amos J.

Neural Information Processing Systems

Performing knowledge transfer from a large teacher network to a smaller student is a popular task in modern deep learning applications. However, due to growing dataset sizes and stricter privacy regulations, it is increasingly common not to have access to the data that was used to train the teacher. We propose a novel method which trains a student to match the predictions of its teacher without using any data or metadata. We achieve this by training an adversarial generator to search for images on which the student poorly matches the teacher, and then using them to train the student. Our resulting student closely approximates its teacher for simple datasets like SVHN, and on CIFAR10 we improve on the state-of-the-art for few-shot distillation (with $100$ images per class), despite using no data.