KD: Improving Language Understanding via Video-Distilled Knowledge Transfer
–Neural Information Processing Systems
KD achieves consistent improvements over text-only language models and vokenization models, on several downstream language understanding tasks including GLUE, SQuAD, and SW AG.
Neural Information Processing Systems
Aug-17-2025, 10:29:29 GMT
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Education > Educational Technology (0.30)
- Technology: