MiniRBT: A Two-stage Distilled Small Chinese Pre-trained Model

Yao, Xin, Yang, Ziqing, Cui, Yiming, Wang, Shijin

Apr-3-2023–arXiv.org Artificial Intelligence

In natural language processing, pre-trained language models have become essential infrastructures. However, these models often suffer from issues such as large size, long inference time, and challenging deployment. Moreover, most mainstream pre-trained models focus on English, and there are insufficient studies on small Chinese pre-trained models. In this paper, we introduce MiniRBT, a small Chinese pre-trained model that aims to advance research in Chinese natural language processing. MiniRBT employs a narrow and deep student model and incorporates whole word masking and two-stage distillation during pre-training to make it well-suited for most downstream tasks. Our experiments on machine reading comprehension and text classification tasks reveal that MiniRBT achieves 94% performance relative to RoBERTa, while providing a 6.8x speedup, demonstrating its effectiveness and efficiency.

computational linguistic, natural language, text classification, (16 more...)

arXiv.org Artificial Intelligence

Apr-3-2023

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.47)
- North America > United States
  - Minnesota (0.28)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Education (0.90)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language > Text Classification (0.36)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found