LearningandTransferringSparseContextualBigrams withLinearTransformers

Neural Information Processing Systems 

Weshowthat when trained from scratch,thetraining process can be split into an initial sample-intensive stage where the correlation is boosted from zero to a nontrivial value, followed by a more sample-efficient stageoffurther improvement. Additionally,weprovethat, provided anontrivial correlation between the downstream and pretraining tasks, finetuning from a pretrained model allowsustobypass the initial sample-intensivestage.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found