Evaluation Protocol: The most ambitious aim of self-supervised learning is to create universal visual representations

Neural Information Processing Systems 

We thank the reviewers for valuable feedback. Before addressing individual comments, we clarify common concerns. Moreover, "image-level" vs "pixel-level" training has no bearing on the validity of evaluating with Any method that uses a CNN learns more than just "image-level" representations; for Our task is to learn pixel-wise semantic-aware embeddings from scratch. We will update the final version to reflect the full 200 training epochs. We first sample regions, then a fixed number of pixels within chosen regions.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found