The Effectiveness of Morphology-aware Segmentation in Low-Resource Neural Machine Translation
Sälevä, Jonne, Lignos, Constantine
–arXiv.org Artificial Intelligence
This paper evaluates the performance of several modern subword segmentation methods in a low-resource neural machine translation setting. We compare segmentations produced by applying BPE at the token or sentence level with morphologically-based segmentations from LMVR and MORSEL. We evaluate translation tasks between English and each of Nepali, Sinhala, and Kazakh, and predict that using morphologically-based segmentation methods would lead to better performance in this setting. However, comparing to BPE, we find that no consistent and reliable differences emerge between the segmentation methods. While morphologically-based methods outperform BPE in a few cases, what performs best tends to vary across tasks, and the performance of segmentation methods is often statistically indistinguishable.
arXiv.org Artificial Intelligence
Mar-20-2021
- Country:
- Oceania > Australia
- North America > United States
- Pennsylvania (0.04)
- Maryland (0.04)
- Massachusetts > Suffolk County
- Boston (0.04)
- Europe
- Asia
- China > Hong Kong (0.04)
- India > West Bengal
- Kolkata (0.04)
- Genre:
- Research Report > New Finding (0.46)
- Technology: