KurdSTS: The Kurdish Semantic Textual Similarity

Abdullah, Abdulhady Abas, Veisi, Hadi, Al, Hussein M.

arXiv.org Artificial Intelligence 

Semantic Textual Similarity measures the degree of equivalence between the two texts and is important in many Natural Language Processing tasks. While extensive resources have been developed for high - resource languages, unfortunately, low - resource languages, for example, Kurdish, have been neglected. In this paper, the first STS dataset for K urdish has been introduced, which aims to alleviate this gap. This dataset contains 10,000 formal and informal sentence pairs annotated for similarity. To this end, aft er benchmarking several models, such as Sentence Bidirectional Encoder Representations from Transformers (Sentence - BERT) and multilingual Bidirectional Encoder Representations from Transformers (multilingual BERT), among others, which achieved promising results while also showcasing the difficulties presented by the distinctive nature of Kurdish. This work paves the way for future studies in Kurdish semantic research and Natural Language Processing in general for other low - resource languages.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found