LICHEE: Improving Language Model Pre-training with Multi-grained Tokenization

Guo, Weidong, Zhao, Mingjun, Zhang, Lusheng, Niu, Di, Luo, Jinwen, Liu, Zhenhua, Li, Zhenyang, Tang, Jianbo

Aug-3-2021–arXiv.org Artificial Intelligence

Language model pre-training based on large corpora has achieved tremendous success in terms of constructing enriched contextual representations and has led to significant performance gains on a diverse range of Natural Language Understanding (NLU) tasks. Despite the success, most current pre-trained language models, such as BERT, are trained based on single-grained tokenization, usually with fine-grained characters or sub-words, making it hard for them to learn the precise meaning of coarse-grained words and phrases. In this paper, we propose a simple yet effective pre-training method named LICHEE to efficiently incorporate multi-grained information of input text. Our method can be applied to various pre-trained language models and improve their representation capability. Extensive experiments conducted on CLUE and SuperGLUE demonstrate that our method achieves comprehensive improvements on a wide variety of NLU tasks in both Chinese and English with little extra inference cost incurred, and that our best ensemble model achieves the state-of-the-art performance on CLUE benchmark competition.

deep learning, neural network, representation, (18 more...)

arXiv.org Artificial Intelligence

Aug-3-2021

arXiv.org PDF

Add feedback

Country:
- North America
  - Canada > Alberta (0.14)
  - United States (0.14)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.68)
  - Natural Language (1.00)