KyrgyzBERT: A Compact, Efficient Language Model for Kyrgyz NLP

Metinov, Adilet, Kudakeeva, Gulida M., Kabaeva, Gulnara D.

arXiv.org Artificial Intelligence 

Kyrgyz remains a low-resource language with limited foundational NLP tools. To address this gap, we introduce KyrgyzBERT, the first publicly available monolingual BERT-based language model for Kyrgyz. The model has 35.9M parameters and uses a custom tokenizer designed for the language's morphological structure. To evaluate performance, we create kyrgyz-sst2, a sentiment analysis benchmark built by translating the Stanford Sentiment Treebank and manually annotating the full test set. KyrgyzBERT fine-tuned on this dataset achieves an F1-score of 0.8280, competitive with a fine-tuned mBERT model five times larger. All models, data, and code are released to support future research in Kyrgyz NLP.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found