Yann Lecun, in his talk, introduced the "cake analogy" to illustrate the importance of self-supervised learning. Though the analogy is debated(ref: Deep Learning for Robotics(Slide 96), Pieter Abbeel), we have seen the impact of self-supervised learning in the Natural Language Processing field where recent developments (Word2Vec, Glove, ELMO, BERT) have embraced self-supervision and achieved state of the art results. "If intelligence is a cake, the bulk of the cake is self-supervised learning, the icing on the cake is supervised learning, and the cherry on the cake is reinforcement learning (RL)." Curious to know how self-supervised learning has been applied in the computer vision field, I read up on existing literature on self-supervised learning applied to computer vision through a recent survey paper by Jing et. This post is my attempt to provide an intuitive visual summary of the patterns of problem formulation in self-supervised learning.
Self-supervised representation learning has achieved impressive results in recent years, with experiments primarily coming on ImageNet or other similarly large internet imagery datasets. There has been little to no work with these methods on other smaller domains, such as satellite, textural, or biological imagery. We experiment with several popular methods on an unprecedented variety of domains. We discover, among other findings, that Rotation is by far the most semantically meaningful task, with much of the performance of Jigsaw and Instance Discrimination being attributable to the nature of their induced distribution rather than semantic understanding. Additionally, there are several areas, such as fine-grain classification, where all tasks underperform. We quantitatively and qualitatively diagnose the reasons for these failures and successes via novel experiments studying pretext generalization, random labelings, and implicit dimensionality. Code and models are available at https://github.com/BramSW/Extending_SSRL_Across_Domains/.
This paper briefly reviews the connections between meta-learning and self-supervised learning. Meta-learning can be applied to improve model generalization capability and to construct general AI algorithms. Self-supervised learning utilizes self-supervision from original data and extracts higher-level generalizable features through unsupervised pre-training or optimization of contrastive loss objectives. In self-supervised learning, data augmentation techniques are widely applied and data labels are not required since pseudo labels can be estimated from trained models on similar tasks. Meta-learning aims to adapt trained deep models to solve diverse tasks and to develop general AI algorithms. We review the associations of meta-learning with both generative and contrastive self-supervised learning models. Unlabeled data from multiple sources can be jointly considered even when data sources are vastly different. We show that an integration of meta-learning and self-supervised learning models can best contribute to the improvement of model generalization capability. Self-supervised learning guided by meta-learner and general meta-learning algorithms under self-supervision are both examples of possible combinations.
Recently, pre-training has been a hot topic in Computer Vision (and also NLP), especially one of the breakthroughs in NLP -- BERT, which proposed a method to train an NLP model by using a "self-supervised" signal. In short, we come up with an algorithm that can generate a "pseudo-label" itself (meaning a label that is true for a specific task), then we treat the learning task as a supervised learning task with the generated pseudo-label. It is commonly called "Pretext Task". For example, BERT uses mask word prediction to train the model (we can then say it is a pre-trained model after it is trained), then fine-tune the model with the task we want (usually called "Downstream Task"), e.g. The mask word prediction is to randomly mask a word in the sentence, and ask the model to predict what is that word given the sentence.
Computer Vision and Pattern Recognition (CVPR) conference is one of the most popular events around the globe where computer vision experts and researchers gather to share their work and views on the trending techniques on various computer vision topics, including object detection, video understanding, visual recognition, among others. This year, the Computer Vision (CV) researchers and engineers have gathered virtually for the conference from 14 June, which will last till 19 June. In this article, we have listed down all the important topics and tutorials that have been discussed on the 1st and 2nd day of the conference. In this tutorial, the researchers presented the latest developments in robust model fitting, recent advancements in new sampling and local optimisation methods, novel branch-and-bound and mathematical programming algorithms in the global methods as well as the latest developments in differentiable alternative to Random Sample Consensus Algorithm or RANSAC. To know what a RANSAC is and how it works, click here.