Extracting General-use Transformers for Low-resource Languages via Knowledge Distillation
Cruz, Jan Christian Blaise, Aji, Alham Fikri
–arXiv.org Artificial Intelligence
In this paper, we propose the use of simple knowledge distillation to produce smaller and more efficient single-language transformers from Massively Multilingual Transformers (MMTs) to alleviate tradeoffs associated with the use of such in low-resource settings. Using Tagalog as a case study, we show that these smaller single-language models perform on-par with strong baselines in a variety of benchmark tasks in a much more efficient manner. Furthermore, we investigate additional steps during the distillation process that improves the soft-supervision of the target language, and provide a number of analyses and ablations to show the efficacy of the proposed method.
arXiv.org Artificial Intelligence
Jan-22-2025
- Country:
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Genre:
- Research Report > New Finding (0.34)
- Technology: