DINO as a von Mises-Fisher mixture model

Govindarajan, Hariprasath, Sidén, Per, Roll, Jacob, Lindsten, Fredrik

arXiv.org Artificial Intelligence 

Self-distillation methods using Siamese networks are popular for self-supervised pre-training. DINO is one such method based on a cross-entropy loss between K-dimensional probability vectors, obtained by applying a softmax function to the dot product between representations and learnt prototypes. Using this insight we propose DINOvMF, that adds appropriate normalization constants when computing the cluster assignment probabilities. Unlike DINO, DINO-vMF is stable also for the larger ViT-Base model with unnormalized prototypes. We show that the added flexibility of the mixture model is beneficial in terms of better image representations. The DINO-vMF pre-trained model consistently performs better than DINO on a range of downstream tasks. We obtain similar improvements for iBOT-vMF vs iBOT and thereby show the relevance of our proposed modification also for other methods derived from DINO. Self-supervised learning (SSL) is an effective approach for pre-training models on large unlabeled datasets. The main objective of SSL pre-training is to learn representations that are transferable to a range of, so called, downstream tasks. Early SSL methods achieved this through handcrafted pretext tasks that act as inductive biases in the representation learning process (Komodakis & Gidaris, 2018; Noroozi & Favaro, 2016; Kim et al., 2018; Doersch et al., 2015; Larsson et al., 2016; Zhang et al., 2016).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found