Improving Pre-Trained Self-Supervised Embeddings Through Effective Entropy Maximization

Chakraborty, Deep, LeCun, Yann, Rudner, Tim G. J., Learned-Miller, Erik

arXiv.org Machine Learning 

Self-supervised learning (SSL) methods are widely employed for pre-training features on unlabeled data and are highly effective for subsequent fine-tuning on a wide variety of downstream tasks [Che+20; Gri+20; Car+20; BPL21]. In this paper, we ask whether it is possible to formulate a well-motivated, general-purpose criterion that allows further improving already-trained, highly-optimized SSL embeddings with only a handful of epochs of continued pre-training. Like several previous works [BJ17; WI20; Liu+22; Ozs+22], we start with the principle of maximizing the entropy of embeddings. One well-known motivation for this is that for a discrete embedding space, maximizing the entropy of a deterministic mapping preserves as much information as possible about the inputs. That is, such a maximum-entropy embedding maximizes the mutual information between the embedding and the input distribution [see, for example, Hje+18]. Similar results hold for continuous embeddings under appropriate noise models [see, for example, discussion of the Gaussian channel in CT91]. By maximizing the amount of information retained, one hopes to prepare as well as possible for future, as-yet-unknown, discrimination tasks. Our contribution is thus not the maximization of embedding entropy, but rather how we go about it.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found