KD: Improving Language Understanding via Video-Distilled Knowledge Transfer

Neural Information Processing Systems 

KD achieves consistent improvements over text-only language models and vokenization models, on several downstream language understanding tasks including GLUE, SQuAD, and SW AG.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found