CAMeMBERT: Cascading Assistant-Mediated Multilingual BERT

Dec-21-2022–arXiv.org Artificial Intelligence

Large language models having hundreds of millions, and even billions, of parameters have performed extremely well on a variety of natural language processing (NLP) tasks. Their widespread use and adoption, however, is hindered by the lack of availability and portability of sufficiently large computational resources. This paper proposes a knowledge distillation (KD) technique building on the work of LightMBERT, a student model of multilingual BERT (mBERT). By repeatedly distilling mBERT through increasingly compressed toplayer distilled teacher assistant networks, CAMeMBERT aims to improve upon the time and space complexities of mBERT while keeping loss of accuracy beneath an acceptable threshold. At present, CAMeMBERT has an average accuracy of around 60.1%, which is subject to change after future improvements to the hyperparameters used in fine-tuning.

init, large language model, natural language, (14 more...)

arXiv.org Artificial Intelligence

Dec-21-2022

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Minnesota > Hennepin County
    - Minneapolis (0.14)
  - Massachusetts > Hampshire County
    - Amherst (0.14)

Genre:
- Research Report (0.64)

Industry:
- Education (0.35)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found