Annotated Guidelines and Building Reference Corpus for Myanmar-English Word Alignment
–arXiv.org Artificial Intelligence
Reference corpus for word alignment is an important resource for developing and evaluating word alignment methods. For Myanmar - English language pairs, there is no reference corpus to evaluate the word alignment tasks. Therefore, we created the guidelines f or Myanmar - English word alignment annotation between two languages over contrastive learning and built the Myanmar - English reference corpus consisting of verified alignments from Myanmar ALT of the Asian Language Treebank (ALT). This reference corpus conta ins confident labels sure (S) and possible (P) for word alignments which are used to test for the purpose of evaluation of the word alignments tasks. We discuss the most linking ambiguities to define consistent and systematic instructions to align manual w ords. We evaluated the results of annotators agreement using our reference corpus in terms of alignment error rate (AER) in word alignment tasks and discuss the words relationships in terms of BLEU scores. A bilingual corpus aligned at the level of sentences or words is a precious resource for developing machine translation systems. Word alignment is a fundamental step in extracting translation information from bilingual corpus and determines which words and phrases are translations of each other in the original and translated sentence. In most translation systems, translational correspondences are rather complex; for a language pair such as Myanmar and Eng lish that belong to the different word order languages.
arXiv.org Artificial Intelligence
Sep-25-2019
- Country:
- Africa > Middle East
- Egypt > Giza Governorate > Giza (0.04)
- Asia > Myanmar
- Mandalay Region > Mandalay (0.05)
- Yangon Region > Yangon (0.04)
- Europe
- Netherlands > South Holland
- Dordrecht (0.04)
- Portugal (0.04)
- Spain
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Netherlands > South Holland
- North America
- Africa > Middle East
- Genre:
- Research Report (0.40)
- Technology: