Understanding Self-supervised Learning with Dual Deep Networks

Tian, Yuandong, Yu, Lantao, Chen, Xinlei, Ganguli, Surya

arXiv.org Artificial Intelligence 

We propose a novel theoretical framework to understand self-supervised learning methods that employ dual pairs of deep ReLU networks (e.g., SimCLR, BYOL). First, we prove that in each SGD update of SimCLR with various loss functions (simple contrastive loss, soft Triplet loss and InfoNCE loss), the weights at each layer are updated by a covariance operator that specifically amplifies initial random selectivities that vary across data samples but survive averages over data augmentations. We show this leads to the emergence of hierarchical features, if the input data are generated from a hierarchical latent tree model. With the same framework, we also show analytically that in BYOL, the combination of Batch-Norm and a predictor network creates an implicit contrastive term, acting as an approximate covariance operator. Additionally, for linear architectures we derive exact solutions for BYOL that provide conceptual insights into how BYOL can learn useful non-collapsed representations without any contrastive terms that separate negative pairs. Extensive ablation studies justify our theoretical findings. Unlike supervised learning (SL) that deals with labeled data, SSL learns meaningful structures from randomly initialized networks without human-provided labels. In this paper, we propose a systematic theoretical analysis of SSL with deep ReLU networks. Our analysis imposes no parametric assumptions on the input data distribution and is applicable to stateof-the-art SSL methods that typically involve two parallel (or dual) deep ReLU networks during training (e.g., SimCLR (Chen et al., 2020a), BYOL (Grill et al., 2020), etc). We do so by developing an analogy between SSL and a theoretical framework for analyzing supervised learning, namely the student-teacher setting (Tian, 2020; Allen-Zhu and Li, 2020; Lampinen and Ganguli, 2018; Saad and Solla, 1996), which also employs a pair of dual networks.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found