Moonshine: Distilling with Cheap Convolutions
Elliot J. Crowley, Gavin Gray, Amos J. Storkey
–Neural Information Processing Systems
Using attention transfer, we provide Pareto curves/tables for distillation of residual networks with four benchmark datasets, indicating the memory versus accuracy payoff.
Neural Information Processing Systems
Feb-12-2026, 18:36:37 GMT
- Technology: