AITopics | central kurdish

Collaborating Authors

central kurdish

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

KurdSTS: The Kurdish Semantic Textual Similarity

Abdullah, Abdulhady Abas, Veisi, Hadi, Al, Hussein M.

arXiv.org Artificial IntelligenceDec-1-2025

Semantic Textual Similarity measures the degree of equivalence between the two texts and is important in many Natural Language Processing tasks. While extensive resources have been developed for high - resource languages, unfortunately, low - resource languages, for example, Kurdish, have been neglected. In this paper, the first STS dataset for K urdish has been introduced, which aims to alleviate this gap. This dataset contains 10,000 formal and informal sentence pairs annotated for similarity. To this end, aft er benchmarking several models, such as Sentence Bidirectional Encoder Representations from Transformers (Sentence - BERT) and multilingual Bidirectional Encoder Representations from Transformers (multilingual BERT), among others, which achieved promising results while also showcasing the difficulties presented by the distinctive nature of Kurdish. This work paves the way for future studies in Kurdish semantic research and Natural Language Processing in general for other low - resource languages.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.02336

Country: Asia > Middle East > Iraq > Kurdistan Region (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

From Dialect Gaps to Identity Maps: Tackling Variability in Speaker Verification

Abdullah, Abdulhady Abas, Badawi, Soran, Abdullah, Dana A., Hamad, Dana Rasul

arXiv.org Artificial IntelligenceOct-7-2025

The complexity and difficulties of Kurdish speaker detection among its several dialects are investigated in this work. Because of its great phonetic and lexical differences, Kurdish with several dialects including Kurmanji, Sorani, and Hawrami offers special challenges for speaker recognition systems. The main difficulties in building a strong speaker identification system capable of precisely identifying speakers across several dialects are investigated in this work. To raise the accuracy and dependability of these systems, it also suggests solutions like sophisticated machine learning approaches, data augmentation tactics, and the building of thorough dialect-specific corpus. The results show that customized strategies for every dialect together with cross-dialect training greatly enhance recognition performance.

artificial intelligence, deep learning, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2505.04629

Country: Asia > Middle East > Iraq > Kurdistan Region (0.29)

Genre: Research Report > New Finding (0.87)

Industry:

Government (0.68)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Speech > Acoustic Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

KuBERT: Central Kurdish BERT Model and Its Application for Sentiment Analysis

Awlla, Kozhin muhealddin, Veisi, Hadi, Abdullah, Abdulhady Abas

arXiv.org Artificial IntelligenceSep-23-2025

This paper enhances the study of sentiment analysis for the Central Kurdish language by integrating the Bidirectional Encoder Representations from Transformers (BERT) into Natural Language Processing techniques. Kurdish is a low - resourced language, having a high level of linguistic diversity with minimal computational resources, making sentiment analysis somewhat challenging. Earlier, this was done using a traditional w ord embedding model, such as Word2Vec, but with the emergence of new language models, specifically BERT, there is hope for improvements. The better word embedding capabilities of BERT lend to this study, aiding in the capturing of the nuanced semantic pool and the contextual intricacies of the language under study, the Kurdish language, thus setting a new benchmark for sentiment analysis in low - resource languages. The steps include collecting and normalizing a large corpus of Kurdish texts, pretraining BERT with a special tokenizer for Kurdish, and developing different models for sentiment analysis including Bidirectional Long Short - Term Memory ( BiLSTM), Multi - L ayer Perceptron ( MLP), and finetuning the BERT classifier . The proposed approach consists of 3 cla sses: positive, negative, and neutral sentiment analysis using a sentiment embedding of BERT in four different configurations. The accuracy of the best - performing classifier, BiLSTM, is 74.09%. For the BERT with an MLP classifier model, the maximum accuracy achieved is 73.96%, while the fine - tuned BERT model tops the others with 75.37% accuracy. Additionally, the fine - tuned BERT model demonstrates a vast improvement when focused on t wo 2 - class sentiment analyses positive and negative with an accuracy of 86.

machine learning, natural language, sentiment analysis, (18 more...)

arXiv.org Artificial Intelligence

2509.16804

Country: Asia > Middle East > Iraq (0.68)

Genre: Research Report > New Finding (0.94)

Industry:

Media (0.68)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Enhancing Kurdish Text-to-Speech with Native Corpus Training: A High-Quality WaveGlow Vocoder Approach

Abdullah, Abdulhady Abas, Muhamad, Sabat Salih, Veisi, Hadi

arXiv.org Artificial IntelligenceSep-24-2024

The ability to synthesize spoken language from text has greatly facilitated access to digital content with the advances in text-to-speech technology. However, effective TTS development for low-resource languages, such as Central Kurdish (CKB), still faces many challenges due mainly to the lack of linguistic information and dedicated resources. In this paper, we improve the Kurdish TTS system based on Tacotron by training the Kurdish WaveGlow vocoder on a 21-hour central Kurdish speech corpus instead of using a pre-trained English vocoder WaveGlow. Vocoder training on the target language corpus is required to accurately and fluently adapt phonetic and prosodic changes in Kurdish language. The effectiveness of these enhancements is that our model is significantly better than the baseline system with English pretrained models. In particular, our adaptive WaveGlow model achieves an impressive MOS of 4.91, which sets a new benchmark for Kurdish speech synthesis. On one hand, this study empowers the advanced features of the TTS system for Central Kurdish, and on the other hand, it opens the doors for other dialects in Kurdish and other related languages to further develop.

speech, synthesis, vocoder, (16 more...)

arXiv.org Artificial Intelligence

2409.13734

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > Middle East > Iraq > Kurdistan Region > Sulaymaniyah Governorate (0.04)
Asia > Middle East > Iraq > Erbil Governorate > Erbil (0.04)
(2 more...)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Synthesis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Language and Speech Technology for Central Kurdish Varieties

Ahmadi, Sina, Jaff, Daban Q., Alam, Md Mahfuz Ibn, Anastasopoulos, Antonios

arXiv.org Artificial IntelligenceMar-4-2024

Kurdish, an Indo-European language spoken by over 30 million speakers, is considered a dialect continuum and known for its diversity in language varieties. Previous studies addressing language and speech technology for Kurdish handle it in a monolithic way as a macro-language, resulting in disparities for dialects and varieties for which there are few resources and tools available. In this paper, we take a step towards developing resources for language and speech technology for varieties of Central Kurdish, creating a corpus by transcribing movies and TV series as an alternative to fieldwork. Additionally, we report the performance of machine translation, automatic speech recognition, and language identification as downstream tasks evaluated on Central Kurdish varieties. Data and models are publicly available under an open license at https://github.com/sinaahmadi/CORDI.

central kurdish, kurdish, subdialect, (15 more...)

arXiv.org Artificial Intelligence

2403.01983

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
Asia > Middle East > Iran > Kurdistan Province > Sanandaj (0.08)
Asia > Middle East > Iraq > Erbil Governorate > Erbil (0.07)
(21 more...)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment (0.48)
Information Technology > Security & Privacy (0.46)
Media (0.34)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Transfer Learning for Low-Resource Sentiment Analysis

Hameed, Razhan, Ahmadi, Sina, Daneshfar, Fatemeh

arXiv.org Artificial IntelligenceApr-10-2023

Sentiment analysis is the process of identifying and extracting subjective information from text. Despite the advances to employ cross-lingual approaches in an automatic way, the implementation and evaluation of sentiment analysis systems require language-specific data to consider various sociocultural and linguistic peculiarities. In this paper, the collection and annotation of a dataset are described for sentiment analysis of Central Kurdish. We explore a few classical machine learning and neural network-based techniques for this task. Additionally, we employ an approach in transfer learning to leverage pretrained models for data augmentation. We demonstrate that data augmentation achieves a high F$_1$ score and accuracy despite the difficulty of the task.

machine learning, natural language, sentiment analysis, (15 more...)

arXiv.org Artificial Intelligence

2304.04703

Country:

Asia > Middle East (0.28)
Europe > United Kingdom (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
(3 more...)

Add feedback

Approaches to Corpus Creation for Low-Resource Language Technology: the Case of Southern Kurdish and Laki

Ahmadi, Sina, Azin, Zahra, Belelli, Sara, Anastasopoulos, Antonios

arXiv.org Artificial IntelligenceApr-3-2023

One of the major challenges that under-represented and endangered language communities face in language technology is the lack or paucity of language data. This is also the case of the Southern varieties of the Kurdish and Laki languages for which very limited resources are available with insubstantial progress in tools. To tackle this, we provide a few approaches that rely on the content of local news websites, a local radio station that broadcasts content in Southern Kurdish and fieldwork for Laki. In this paper, we describe some of the challenges of such under-represented languages, particularly in writing and standardization, and also, in retrieving sources of data and retro-digitizing handwritten content to create a corpus for Southern Kurdish and Laki. In addition, we study the task of language identification in light of the other variants of Kurdish and Zaza-Gorani languages.

artificial intelligence, kurdish, natural language, (18 more...)

arXiv.org Artificial Intelligence

2304.01319

Country:

Asia > Middle East > Iraq > Kurdistan Region (0.05)
Europe > Germany > Saxony > Leipzig (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
(13 more...)

Genre: Research Report (0.50)

Industry:

Media > Radio (0.88)
Leisure & Entertainment (0.88)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Central Kurdish machine translation: First large scale parallel corpus and experiments

Amini, Zhila, Mohammadamini, Mohammad, Hosseini, Hawre, Mansouri, Mehran, Jaff, Daban

arXiv.org Artificial IntelligenceJun-17-2021

While the computational processing of Kurdish has experienced a relative increase, the machine translation of this language seems to be lacking a considerable body of scientific work. This is in part due to the lack of resources especially curated for this task. In this paper, we present the first large scale parallel corpus of Central Kurdish-English, Awta, containing 229,222 pairs of manually aligned translations. Our corpus is collected from different text genres and domains in an attempt to build more robust and real-world applications of machine translation. We make a portion of this corpus publicly available in order to foster research in this area. Further, we build several neural machine translation models in order to benchmark the task of Kurdish machine translation. Additionally, we perform extensive experimental analysis of results in order to identify the major challenges that Central Kurdish machine translation faces. These challenges include language-dependent and-independent ones as categorized in this paper, the first group of which are aware of Central Kurdish linguistic properties on different morphological, syntactic and semantic levels. Our best performing systems achieve 22.72 and 16.81 in BLEU score for Ku$\rightarrow$EN and En$\rightarrow$Ku, respectively.

corpus, machine translation, translation, (15 more...)

arXiv.org Artificial Intelligence

2106.09325

Country:

Asia > Middle East > Iraq > Kurdistan Region (0.14)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada > Ontario > Toronto (0.04)
(6 more...)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Jira: a Kurdish Speech Recognition System Designing and Building Speech Corpus and Pronunciation Lexicon

Veisi, Hadi, Hosseini, Hawre, Mohammadamini, Mohammad, Fathy, Wirya, Mahmudi, Aso

arXiv.org Artificial IntelligenceFeb-15-2021

In this paper, we introduce the first large vocabulary speech recognition system (LVSR) for the Central Kurdish language, named Jira. The Kurdish language is an Indo-European language spoken by more than 30 million people in several countries, but due to the lack of speech and text resources, there is no speech recognition system for this language. To fill this gap, we introduce the first speech corpus and pronunciation lexicon for the Kurdish language. Regarding speech corpus, we designed a sentence collection in which the ratio of di-phones in the collection resembles the real data of the Central Kurdish language. The designed sentences are uttered by 576 speakers in a controlled environment with noise-free microphones (called AsoSoft Speech-Office) and in Telegram social network environment using mobile phones (denoted as AsoSoft Speech-Crowdsourcing), resulted in 43.68 hours of speech. Besides, a test set including 11 different document topics is designed and recorded in two corresponding speech conditions (i.e., Office and Crowdsourcing). Furthermore, a 60K pronunciation lexicon is prepared in this research in which we faced several challenges and proposed solutions for them. The Kurdish language has several dialects and sub-dialects that results in many lexical variations. Our methods for script standardization of lexical variations and automatic pronunciation of the lexicon tokens are presented in detail. To setup the recognition engine, we used the Kaldi toolkit. A statistical tri-gram language model that is extracted from the AsoSoft text corpus is used in the system. Several standard recipes including HMM-based models (i.e., mono, tri1, tr2, tri2, tri3), SGMM, and DNN methods are used to generate the acoustic model. These methods are trained with AsoSoft Speech-Office and AsoSoft Speech-Crowdsourcing and a combination of them. The best performance achieved by the SGMM acoustic model which results in 13.9% of the average word error rate (on different document topics) and 4.9% for the general topic.

central kurdish, corpus, kurdish, (12 more...)

arXiv.org Artificial Intelligence

2102.07412

Country:

Asia > Middle East > Iraq > Kurdistan Region (0.05)
Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
Asia > Middle East > Syria (0.04)
(8 more...)

Genre: Research Report (0.82)

Industry:

Government (0.46)
Media (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback