No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations Walter Simoncini 1, Andrei Bursuc

Neural Information Processing Systems 

Our method is simple: given any pretrained model, we first compute gradients from various self-supervised objectives for each input. These gradients are projected to a lower dimension and then concatenated with the model's output embedding. The resulting features are evaluated on k-nearest neighbor classification over 11 datasets from vision, 5 from natural language processing, and 2 from audio.