Marmara Turkish Coreference Corpus and Coreference Resolution Baseline

Schüller, Peter, Cıngıllı, Kübra, Tunçer, Ferit, Sürmeli, Barış Gün, Pekel, Ayşegül, Karatay, Ayşe Hande, Karakaş, Hacer Ezgi

arXiv.org Artificial Intelligence 

Coreference Resolution is the task of identifying groups of phrases in a text that refer to the same discourse entity. Such referring phrases are called mentions, a set of mentions that all refer to the same 1 discourse entity is called a coreference chain. Annotated corpora are important resources for developing and evaluating automatic coreference resolution methods. Turkish is an agglutinative language and Turkish coreference resolution poses several challenges different from many other languages, in particular the absence of grammatical gender, the possibility of null pronouns in subject and object position, possessive pronouns that can be expressed as suffixes, and ambiguities among possessive and number morphemes, e.g., 'çocukları' can be analysed as'their children' or as'his/her children', depending on context Oflazer and Bozşahin (1994). No coreference resolution corpus exists for Turkish so far. We here describe the result of an effort to create such a corpus based on the METU-Sabanci Turkish Treebank (Say, Zeyrek, Oflazer, and Özge, 2004; Atalay, Oflazer, and Say, 2003; Oflazer, Say, Hakkani-Tür, and Tür, 2003) which is, to the best of our knowledge, the only publicly available Turkish Treebank. Our contributions are as follows.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found