Moonshine: Distilling with Cheap Convolutions

Elliot J. Crowley, Gavin Gray, Amos J. Storkey

Neural Information Processing Systems 

Using attention transfer, we provide Pareto curves/tables for distillation of residual networks with four benchmark datasets, indicating the memory versus accuracy payoff.