Goto

Collaborating Authors

 augmentation and dataset bias


Demystifying Contrastive Self-Supervised Learning: Invariances, Augmentations and Dataset Biases

Neural Information Processing Systems

Self-supervised representation learning approaches have recently surpassed their supervised learning counterparts on downstream tasks like object detection and image classification. Somewhat mysteriously the recent gains in performance come from training instance classification models, treating each image and it's augmented versions as samples of a single class.


Review for NeurIPS paper: Demystifying Contrastive Self-Supervised Learning: Invariances, Augmentations and Dataset Biases

Neural Information Processing Systems

While it is interesting that self-supervised methods are more invariant to occlusion, it is unclear why they wouldn't also be more invariant to the other augmentations used during training. For example, supervised learning appears more invariant to "Illumination Color" (Top-25 category) despite self-supervised learning methods using aggressive color augmentation techniques. This discrepancy is not discussed and we are left wondering what it means. Next, while the analysis of transfer performance as a function of cropped vs. original training and test datasets is interesting, it is unclear whether the results really support the authors' interpretation. They find that training and testing on the same type of images (i.e. This is to be expected, as this minimizes the domain gap between training and testing.


Review for NeurIPS paper: Demystifying Contrastive Self-Supervised Learning: Invariances, Augmentations and Dataset Biases

Neural Information Processing Systems

The topic of the paper is very relevant to the NeurIPS community, given the increased interest in understanding self-supervised learning. Reviewers have appreciated the direction the paper takes for this, ie, to study invariances learned by self-supervised learning methods, comparing them with supervised representations. There were some concerns about the interpretations of the emprical results which have been addressed in the author response. This paper takes the first and important step towards understanding the invariances in self-supervised representations and their implications on downstream tasks, and would be of interest to the NeurIPS community.


Demystifying Contrastive Self-Supervised Learning: Invariances, Augmentations and Dataset Biases

Neural Information Processing Systems

Self-supervised representation learning approaches have recently surpassed their supervised learning counterparts on downstream tasks like object detection and image classification. Somewhat mysteriously the recent gains in performance come from training instance classification models, treating each image and it's augmented versions as samples of a single class. We demonstrate that approaches like MOCO and PIRL learn occlusion-invariant representations. However, they fail to capture viewpoint and category instance invariance which are crucial components for object recognition. Second, we demonstrate that these approaches obtain further gains from access to a clean object-centric training dataset like Imagenet.