Self-Supervised Visual Representation Learning from Hierarchical Grouping

Oct-11-2024, 06:22:31 GMT–Neural Information Processing Systems

We create a framework for bootstrapping visual representation learning from a primitive visual grouping capability. We operationalize grouping via a contour detector that partitions an image into regions, followed by merging of those regions into a tree hierarchy. Across a large unlabeled dataset, we apply this learned primitive to automatically predict hierarchical region structure. These predictions serve as guidance for self-supervised contrastive feature learning: we task a deep network with producing per-pixel embeddings whose pairwise distances respect the region hierarchy. Experiments demonstrate that our approach can serve as state-of-the-art generic pre-training, benefiting downstream tasks.

hierarchical grouping, hierarchy, self-supervised visual representation learning

Neural Information Processing Systems

Oct-11-2024, 06:22:31 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)