Marmara Turkish Coreference Corpus and Coreference Resolution Baseline

Schüller, Peter, Cıngıllı, Kübra, Tunçer, Ferit, Sürmeli, Barış Gün, Pekel, Ayşegül, Karatay, Ayşe Hande, Karakaş, Hacer Ezgi

Jul-31-2018–arXiv.org Artificial Intelligence

Coreference Resolution is the task of identifying groups of phrases in a text that refer to the same discourse entity. Such referring phrases are called mentions, a set of mentions that all refer to the same 1 discourse entity is called a coreference chain. Annotated corpora are important resources for developing and evaluating automatic coreference resolution methods. Turkish is an agglutinative language and Turkish coreference resolution poses several challenges different from many other languages, in particular the absence of grammatical gender, the possibility of null pronouns in subject and object position, possessive pronouns that can be expressed as suffixes, and ambiguities among possessive and number morphemes, e.g., 'çocukları' can be analysed as'their children' or as'his/her children', depending on context Oflazer and Bozşahin (1994). No coreference resolution corpus exists for Turkish so far. We here describe the result of an effort to create such a corpus based on the METU-Sabanci Turkish Treebank (Say, Zeyrek, Oflazer, and Özge, 2004; Atalay, Oflazer, and Say, 2003; Oflazer, Say, Hakkani-Tür, and Tür, 2003) which is, to the best of our knowledge, the only publicly available Turkish Treebank. Our contributions are as follows.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Jul-31-2018

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Pennsylvania (0.04)
- Europe
  - Austria > Vienna (0.14)
  - Netherlands (0.04)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
  - Middle East > Republic of Türkiye
    - Istanbul Province > Istanbul (0.04)
  - Germany > North Rhine-Westphalia
    - Upper Bavaria > Munich (0.04)
- Asia > Middle East
  - Republic of Türkiye > Istanbul Province > Istanbul (0.04)

Genre:
- Research Report (0.81)

Industry:
- Education > Educational Setting (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Text Processing (1.00)
  - Representation & Reasoning > Expert Systems (0.67)
  - Machine Learning
    - Statistical Learning (0.94)
    - Neural Networks > Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found