Creating a morphological and syntactic tagged corpus for the Uzbek language

Sharipov, Maksud, Mattiev, Jamolbek, Sobirov, Jasur, Baltayev, Rustam

Oct-27-2022–arXiv.org Artificial Intelligence

Nowadays, creation of the tagged corpora is becoming one of the most important tasks of Natural Language Processing (NLP). There are not enough tagged corpora to build machine learning models for the low-resource Uzbek language. In this paper, we tried to fill that gap by developing a novel Part Of Speech (POS) and syntactic tagset for creating the syntactic and morphologically tagged corpus of the Uzbek language. This work also includes detailed description and presentation of a web-based application to work on a tagging as well. Based on the developed annotation tool and the software, we share our experience results of the first stage of the tagged corpus creaton.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

Oct-27-2022

arXiv.org PDF

Add feedback

Country:
- Europe > Slovenia
  - Coastal-Karst > Municipality of Koper > Koper (0.04)
- Asia > Uzbekistan
  - Toshkent Shahri > Tashkent (0.14)
  - Navoiy Region > Navoiy (0.05)

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language
    - Grammars & Parsing (0.72)
    - Text Processing (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found