Dialogue-adaptive Language Model Pre-training From Quality Estimation
Li, Junlong, Zhang, Zhuosheng, Zhao, Hai
–arXiv.org Artificial Intelligence
Pre-trained language models (PrLMs) have achieved great success on a wide range of natural language processing tasks by virtue of the universal language representation ability obtained by self-supervised learning on a large corpus. These models are pre-trained on standard plain texts with general language model (LM) training objectives, which would be insufficient to model dialogue-exclusive attributes like specificity and informativeness reflected in these tasks that are not explicitly captured by the pre-trained universal language representations. In this work, we propose dialogue-adaptive pre-training objectives (DAPO) derived from quality estimation to simulate dialogue-specific features, namely coherence, specificity, and informativeness. As the foundation for model pre-training, we synthesize a new dialogue corpus and build our training set with two unsupervised methods: 1) coherence-oriented context corruption, including utterance ordering, insertion, and replacement, to help the model capture the coherence inside the dialogue contexts; and 2) specificity-oriented automatic rescoring, which encourages the model to measure the quality of the synthesized data for dialogue-adaptive pre-training by considering specificity and informativeness. Experimental results on widely used open-domain response selection and quality estimation benchmarks show that DAPO significantly improves the baseline models and achieves state-of-the-art performance on the MuTual leaderboard, verifying the effectiveness of estimating quality evaluation factors into pre-training.
arXiv.org Artificial Intelligence
Oct-20-2022
- Country:
- Oceania > Australia
- North America
- United States
- Pennsylvania > Philadelphia County
- Philadelphia (0.04)
- New York > New York County
- New York City (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Michigan > Washtenaw County
- Ann Arbor (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- California > San Diego County
- San Diego (0.04)
- Pennsylvania > Philadelphia County
- Canada > British Columbia
- United States
- Europe
- Asia
- Africa > Ethiopia
- Addis Ababa > Addis Ababa (0.04)
- Genre:
- Research Report (1.00)
- Technology: