Multi-Criteria Chinese Word Segmentation with Transformer

Qiu, Xipeng, Pei, Hengzhi, Yan, Hang, Huang, Xuanjing

arXiv.org Artificial Intelligence 

Different linguistic perspectives cause many diverse segmentation criteria for Chinese word segmentation (CWS). Most existing methods focus on improving the performance of single-criterion CWS. However, it is interesting to exploit these heterogeneous segmentation criteria and mine their common underlying knowledge. In this paper, we propose a concise and effective model for multi-criteria CWS, which utilizes a shared fully-connected self-attention model to segment the sentence according to a criterion indicator. Experiments on eight datasets with heterogeneous segmentation criteria show that the performance of each corpus obtains a significant improvement, compared to single-criterion learning.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found