Review for NeurIPS paper: CompRess: Self-Supervised Learning by Compressing Representations

Neural Information Processing Systems 

Weaknesses: One big issue that I see is, it's not very meaningful to do model compression for unsupervised models before the current evolution of contrastive approaches plateau. Then why do we still need the distillation method proposed today, instead of directly using the new contrastive method? So this distillation method will quickly fade away. For instance, IIRC, [a] achieves 73.0% linear accuracy and transfer better to Pascal VOC and COCO, then the effectiveness of this paper will be largely discounted (by the way [a] might be discussed as well). But it's unclear how your method could improve upon better self-supervised methods, e.g., can you improve upon [a] using your method out of the box?