compressing representation
CompRess: Self-Supervised Learning by Compressing Representations
Self-supervised learning aims to learn good representations with unlabeled data. Recent works have shown that larger models benefit more from self-supervised learning than smaller models. As a result, the gap between supervised and self-supervised learning has been greatly reduced for larger models. In this work, instead of designing a new pseudo task for self-supervised learning, we develop a model compression method to compress an already learned, deep self-supervised model (teacher) to a smaller one (student). We train the student model so that it mimics the relative similarity between the datapoints in the teacher's embedding space. For AlexNet, our method outperforms all previous methods including the fully supervised model on ImageNet linear evaluation (59.0%
Review for NeurIPS paper: CompRess: Self-Supervised Learning by Compressing Representations
Weaknesses: One big issue that I see is, it's not very meaningful to do model compression for unsupervised models before the current evolution of contrastive approaches plateau. Then why do we still need the distillation method proposed today, instead of directly using the new contrastive method? So this distillation method will quickly fade away. For instance, IIRC, [a] achieves 73.0% linear accuracy and transfer better to Pascal VOC and COCO, then the effectiveness of this paper will be largely discounted (by the way [a] might be discussed as well). But it's unclear how your method could improve upon better self-supervised methods, e.g., can you improve upon [a] using your method out of the box?
Review for NeurIPS paper: CompRess: Self-Supervised Learning by Compressing Representations
This paper presents an approach for distillation of self-supervised models. All the reviewers acknowledge that the paper present a simple approach which outperforms several baselines. There are some concerns with respect to: (a) speed with which SSL field changes and applicability to new approaches; (b) clarity of tables; (c) claim of better than alexNet supervised. There was a rebuttal which answered some of the concerns. The AC agrees with authors that we should not wait for better models before working on model compression.
CompRess: Self-Supervised Learning by Compressing Representations
Self-supervised learning aims to learn good representations with unlabeled data. Recent works have shown that larger models benefit more from self-supervised learning than smaller models. As a result, the gap between supervised and self-supervised learning has been greatly reduced for larger models. In this work, instead of designing a new pseudo task for self-supervised learning, we develop a model compression method to compress an already learned, deep self-supervised model (teacher) to a smaller one (student). We train the student model so that it mimics the relative similarity between the datapoints in the teacher's embedding space.
Compressing Representations for Embedded Deep Learning
Assine, Juliano S., Godoy, Alan, Valle, Eduardo
Despite recent advances in architectures for mobile devices, deep learning computational requirements remains prohibitive for most embedded devices. To address that issue, we envision sharing the computational costs of inference between local devices and the cloud, taking advantage of the compression performed by the first layers of the networks to reduce communication costs. Inference in such distributed setting would allow new applications, but requires balancing a triple trade-off between computation cost, communication bandwidth, and model accuracy. We explore that trade-off by studying the compressibility of representations at different stages of MobileNetV2, showing those results agree with theoretical intuitions about deep learning, and that an optimal splitting layer for network can be found with a simple PCA-based compression scheme.