(Self-)Supervised Pre-training? Self-training? Which one to use?


Recently, pre-training has been a hot topic in Computer Vision (and also NLP), especially one of the breakthroughs in NLP -- BERT, which proposed a method to train an NLP model by using a "self-supervised" signal. In short, we come up with an algorithm that can generate a "pseudo-label" itself (meaning a label that is true for a specific task), then we treat the learning task as a supervised learning task with the generated pseudo-label. It is commonly called "Pretext Task". For example, BERT uses mask word prediction to train the model (we can then say it is a pre-trained model after it is trained), then fine-tune the model with the task we want (usually called "Downstream Task"), e.g. The mask word prediction is to randomly mask a word in the sentence, and ask the model to predict what is that word given the sentence.

Duplicate Docs Excel Report

None found

Similar Docs  Excel Report  more

None found