Readability-guided Idiom-aware Sentence Simplification (RISS) for Chinese
Zhang, Jingshen, Chen, Xinglu, Qiu, Xinying, Wang, Zhimin, Feng, Wenhe
–arXiv.org Artificial Intelligence
Chinese sentence simplification faces challenges due to the lack of large-scale labeled parallel corpora and the prevalence of idioms. To address these challenges, we propose Readability-guided Idiom-aware Sentence Simplification (RISS), a novel framework that combines data augmentation techniques with lexcial simplification. RISS introduces two key components: (1) Readability-guided Paraphrase Selection (RPS), a method for mining high-quality sentence pairs, and (2) Idiom-aware Simplification (IAS), a model that enhances the comprehension and simplification of idiomatic expressions. By integrating RPS and IAS using multi-stage and multi-task learning strategies, RISS outperforms previous state-of-the-art methods on two Chinese sentence simplification datasets. Furthermore, RISS achieves additional improvements when fine-tuned on a small labeled dataset. Our approach demonstrates the potential for more effective and accessible Chinese text simplification.
arXiv.org Artificial Intelligence
Jun-5-2024
- Country:
- Asia
- China > Guangdong Province
- Guangzhou (0.04)
- Myanmar > Tanintharyi Region
- Dawei (0.04)
- China > Guangdong Province
- Europe > Netherlands (0.04)
- Asia
- Genre:
- Research Report (1.00)
- Technology: