Idiom Detection in Sorani Kurdish Texts
Omer, Skala Kamaran, Hassani, Hossein
–arXiv.org Artificial Intelligence
Idiom detection using Natural Language Processing (NLP) is the computerized process of recognizing figurative expressions within a text that convey meanings beyond the literal interpretation of the words. While idiom detection has seen significant progress across various languages, the Kurdish language faces a considerable research gap in this area despite the importance of idioms in tasks like machine translation and sentiment analysis. This study addresses idiom detection in Sorani Kurdish by approaching it as a text classification task using deep learning techniques. To tackle this, we developed a dataset containing 10,580 sentences embedding 101 Sorani Kurdish idioms across diverse contexts. Using this dataset, we developed and evaluated three deep learning models: KuBERT-based transformer sequence classification, a Recurrent Convolutional Neural Network (RCNN), and a BiLSTM model with an attention mechanism. The evaluations revealed that the transformer model, the fine-tuned BERT, consistently outperformed the others, achieving nearly 99% accuracy while the RCNN achieved 96.5% and the BiLSTM 80%. These results highlight the effectiveness of Transformer-based architectures in low-resource languages like Kurdish. This research provides a dataset, three optimized models, and insights into idiom detection, laying a foundation for advancing Kurdish NLP.
arXiv.org Artificial Intelligence
Jan-30-2025
- Country:
- Asia
- India (0.04)
- Middle East > Iraq
- Erbil Governorate > Erbil (0.04)
- Kurdistan Region (0.14)
- Thailand > Bangkok
- Bangkok (0.04)
- Europe
- Bulgaria (0.04)
- Czechia > Prague (0.04)
- Germany > Berlin (0.04)
- Greece > Attica
- Athens (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Poland > Masovia Province
- Warsaw (0.04)
- Spain
- Andalusia > Málaga Province
- Málaga (0.04)
- Catalonia > Barcelona Province
- Barcelona (0.04)
- Andalusia > Málaga Province
- Sweden > Vaestra Goetaland
- Gothenburg (0.04)
- North America
- Mexico > Mexico City
- Mexico City (0.04)
- United States
- Hawaii > Honolulu County
- Honolulu (0.04)
- Washington > King County
- Seattle (0.04)
- Hawaii > Honolulu County
- Mexico > Mexico City
- Oceania > Australia
- New South Wales > Sydney (0.04)
- South America > Colombia
- Meta Department > Villavicencio (0.04)
- Asia
- Genre:
- Research Report > Experimental Study (0.34)
- Technology: