The focus was largely on supervised learning methods that require huge amounts of labeled data to train systems for specific use cases. Bidirectional Encoder Representations from Transformers (BERT) a paper published by researchers at the Google AI team has become a gold standard when it comes to several NLP tasks such as Natural Language Inference (MNLI), Question Answering (SQuAD), and more. To make BERT handle a variety of downstream tasks, input representation is able to unambiguously represent a pair of sentences that are packed together in a single sequence. While autoencoding models like BERT utilize self-supervised learning for tasks like sentence classification (next or not), another application of self-supervised approaches lies in the domain of text generation. The inputs are passed through our pre-trained model to obtain the final transformer block's activation hm l, which is then fed into an added linear output layer with parameters W y to predict y: Translation Language Modelling (TLM): a new addition and an extension of MLM, where instead of considering monolingual text streams, parallel sentences are concatenated as illustrated in the following image.
Self-supervised representation learning methods aim to provide powerful deep feature learning without the requirement of large annotated datasets, thus alleviating the annotation bottleneck that is one of the main barriers to practical deployment of deep learning today. These methods have advanced rapidly in recent years, with their efficacy approaching and sometimes surpassing fully supervised pre-training alternatives across a variety of data modalities including image, video, sound, text and graphs. This article introduces this vibrant area including key concepts, the four main families of approach and associated state of the art, and how self-supervised methods are applied to diverse modalities of data. We further discuss practical considerations including workflows, representation transferability, and compute cost. Finally, we survey the major open challenges in the field that provide fertile ground for future work.
Deep neural networks are typically trained under a supervised learning framework where a model learns a single task using labeled data. Instead of relying solely on labeled data, practitioners can harness unlabeled or related data to improve model performance, which is often more accessible and ubiquitous. Self-supervised pre-training for transfer learning is becoming an increasingly popular technique to improve state-of-the-art results using unlabeled data. It involves first pre-training a model on a large amount of unlabeled data, then adapting the model to target tasks of interest. In this review, we survey self-supervised learning methods and their applications within the sequential transfer learning framework. We provide an overview of the taxonomy for self-supervised learning and transfer learning, and highlight some prominent methods for designing pre-training tasks across different domains. Finally, we discuss recent trends and suggest areas for future investigation.
This paper briefly reviews the connections between meta-learning and self-supervised learning. Meta-learning can be applied to improve model generalization capability and to construct general AI algorithms. Self-supervised learning utilizes self-supervision from original data and extracts higher-level generalizable features through unsupervised pre-training or optimization of contrastive loss objectives. In self-supervised learning, data augmentation techniques are widely applied and data labels are not required since pseudo labels can be estimated from trained models on similar tasks. Meta-learning aims to adapt trained deep models to solve diverse tasks and to develop general AI algorithms. We review the associations of meta-learning with both generative and contrastive self-supervised learning models. Unlabeled data from multiple sources can be jointly considered even when data sources are vastly different. We show that an integration of meta-learning and self-supervised learning models can best contribute to the improvement of model generalization capability. Self-supervised learning guided by meta-learner and general meta-learning algorithms under self-supervision are both examples of possible combinations.
Recently, pre-training has been a hot topic in Computer Vision (and also NLP), especially one of the breakthroughs in NLP -- BERT, which proposed a method to train an NLP model by using a "self-supervised" signal. In short, we come up with an algorithm that can generate a "pseudo-label" itself (meaning a label that is true for a specific task), then we treat the learning task as a supervised learning task with the generated pseudo-label. It is commonly called "Pretext Task". For example, BERT uses mask word prediction to train the model (we can then say it is a pre-trained model after it is trained), then fine-tune the model with the task we want (usually called "Downstream Task"), e.g. The mask word prediction is to randomly mask a word in the sentence, and ask the model to predict what is that word given the sentence.