A Dataset of Kurdish (Sorani) Named Entities -- An Amendment to Kurdish-BLARK Named Entities
Salar, Sazan, Hassani, Hossein
–arXiv.org Artificial Intelligence
Named Entity Recognition (NER) is one of the essential applications of Natural Language Processing (NLP). It is also an instrument that plays a significant role in many other NLP applications, such as Machine Translation (MT), Information Retrieval (IR), and Part of Speech Tagging (POST). Kurdish is an under-resourced language from the NLP perspective. Particularly, in all the categories, the lack of NER resources hinders other aspects of Kurdish processing. In this work, we present a data set that covers several categories of NEs in Kurdish (Sorani). The dataset is a significant amendment to a previously developed dataset in the Kurdish BLARK (Basic Language Resource Kit). It covers 11 categories and 33261 entries in total. The dataset is publicly available for non-commercial use under CC BY-NC-SA 4.0 license at https://kurdishblark.github.io/.
arXiv.org Artificial Intelligence
Jan-12-2023
- Country:
- Asia > Middle East > Iraq
- Erbil Governorate > Erbil (0.07)
- Kurdistan Region > Duhok Governorate
- Duhok (0.07)
- Asia > Middle East > Iraq
- Genre:
- Research Report (0.40)
- Industry:
- Government > Regional Government (0.41)
- Technology: