Towards No.1 in CLUE Semantic Matching Challenge: Pre-trained Language Model Erlangshen with Propensity-Corrected Loss

Wang, Junjie, Zhang, Yuxiang, Yang, Ping, Gan, Ruyi

Aug-4-2022–arXiv.org Artificial Intelligence

This report describes a pre-trained language model Erlangshen with propensity-corrected loss, the No.1 in CLUE Semantic Matching Challenge. In the pre-training stage, we construct a dynamic masking strategy based on knowledge in Masked Language Modeling (MLM) with whole word masking. Furthermore, by observing the specific structure of the dataset, the pre-trained Erlangshen applies propensity-corrected loss (PCL) in the fine-tuning phase. Overall, we achieve 72.54 points in F1 Score and 78.90 points in Accuracy on the test set. Our code is publicly available at: https://github.com/IDEA-CCNL/Fengshenbang-LM/tree/hf-ds/fengshen/examples/clue_sim.

dataset, erlangshen, propensity-corrected loss, (9 more...)

arXiv.org Artificial Intelligence

Aug-4-2022

arXiv.org PDF

Add feedback

Country:
- Europe > Spain
  - Catalonia > Barcelona Province > Barcelona (0.04)
- Asia
  - Japan > Honshū
    - Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
  - China > Guangdong Province
    - Shenzhen (0.04)

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language > Information Retrieval (0.51)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found