Do not Mask Randomly: Effective Domain-adaptive Pre-training by Masking In-domain Keywords
Golchin, Shahriar, Surdeanu, Mihai, Tavabi, Nazgol, Kiapour, Ata
–arXiv.org Artificial Intelligence
We propose a novel task-agnostic in-domain pre-training method that sits between generic pre-training and fine-tuning. Our approach selectively masks in-domain keywords, i.e., words that provide a compact representation of the target domain. We identify such keywords using KeyBERT (Grootendorst, 2020). We evaluate our approach using six different settings: three datasets combined with two distinct pre-trained language models (PLMs). Our results reveal that the fine-tuned PLMs adapted using our in-domain pre-training strategy outperform PLMs that used in-domain pre-training with random masking as well as those that followed the common pre-train-then-fine-tune paradigm. Further, the overhead of identifying in-domain keywords is reasonable, e.g., 7-15% of the pre-training time (for two epochs) for BERT Large (Devlin et al., 2019).
arXiv.org Artificial Intelligence
Jul-14-2023
- Country:
- North America > United States
- Arizona > Pima County
- Tucson (0.14)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Arizona > Pima County
- North America > United States
- Genre:
- Research Report
- Experimental Study (0.69)
- New Finding (1.00)
- Research Report
- Industry:
- Health & Medicine (1.00)
- Technology: