Reviews: Zero-shot Knowledge Transfer via Adversarial Belief Matching

Neural Information Processing Systems 

While I am only guessing that performance may degrade as a function of dataset scale, it is not hard to imagine advances in GANs which could make that degradation smaller, hence make the proposed method more useful. Further, even in an adversarial setting, it may be possible to guess what kind of inputs are relevant, or extend the method to few-shot or some hybrid approach. I am positively surprised that features of the student have comparable transferability to the teacher, I was concerned that some sort of overfitting to a teacher's decision boundary was possible, but this does not seem to be the case. While I agree with the authors that, in most cases, those releasing research models will not go out of their way to vaccinate them against zero-shot distillation, the proposed method could be used to (somewhat) copy and repurpose information stored in hardware model. Take for example Tesla's autopilot which uses several neural networks in it and is trained on tens of billions of images which are not available to the world.