Moonshine: Distilling with Cheap Convolutions
Elliot J. Crowley, Gavin Gray, Amos J. Storkey
–Neural Information Processing Systems
Using attention transfer, we provide Pareto curves/tables for distillation of residual networks with four benchmark datasets, indicating the memory versus accuracy payoff.
Neural Information Processing Systems
Nov-20-2025, 16:12:01 GMT
- Technology: