KuBERT: Central Kurdish BERT Model and Its Application for Sentiment Analysis

Awlla, Kozhin muhealddin, Veisi, Hadi, Abdullah, Abdulhady Abas

arXiv.org Artificial Intelligence 

This paper enhances the study of sentiment analysis for the Central Kurdish language by integrating the Bidirectional Encoder Representations from Transformers (BERT) into Natural Language Processing techniques. Kurdish is a low - resourced language, having a high level of linguistic diversity with minimal computational resources, making sentiment analysis somewhat challenging. Earlier, this was done using a traditional w ord embedding model, such as Word2Vec, but with the emergence of new language models, specifically BERT, there is hope for improvements. The better word embedding capabilities of BERT lend to this study, aiding in the capturing of the nuanced semantic pool and the contextual intricacies of the language under study, the Kurdish language, thus setting a new benchmark for sentiment analysis in low - resource languages. The steps include collecting and normalizing a large corpus of Kurdish texts, pretraining BERT with a special tokenizer for Kurdish, and developing different models for sentiment analysis including Bidirectional Long Short - Term Memory ( BiLSTM), Multi - L ayer Perceptron ( MLP), and finetuning the BERT classifier . The proposed approach consists of 3 cla sses: positive, negative, and neutral sentiment analysis using a sentiment embedding of BERT in four different configurations. The accuracy of the best - performing classifier, BiLSTM, is 74.09%. For the BERT with an MLP classifier model, the maximum accuracy achieved is 73.96%, while the fine - tuned BERT model tops the others with 75.37% accuracy. Additionally, the fine - tuned BERT model demonstrates a vast improvement when focused on t wo 2 - class sentiment analyses positive and negative with an accuracy of 86.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found