Co-Occurrence-Based Error Correction Approach to Word Segmentation
Chaowicharat, Ekawat (Mahidol University) | Naruedomkul, Kanlaya (Mahidol University)
To overcome the problems in Thai word segmentation, a number of word segmentation has been proposed during the long period of time until today. We propose a novel Thai word segmentation approach so called Co-occurrence-Based Error Correction (CBEC). CBEC generates all possible segmentation candidates using the classical maximal matching algorithm and then selects the most accurate segmentation based on co-occurrence and an error correction algorithm. CBEC was trained and evaluated on BEST 2009 corpus.
May-18-2011
- Technology: