HausaMovieReview: A Benchmark Dataset for Sentiment Analysis in Low-Resource African Language

Zanga, Asiya Ibrahim, Abdulrahman, Salisu Mamman, Ado, Abubakar, Bichi, Abdulkadir Abubakar, Jibril, Lukman Aliyu, Umar, Abdulmajid Babangida, Adamu, Alhassan, Muhammad, Shamsuddeen Hassan, Abubakar, Bashir Salisu

Sep-23-2025–arXiv.org Artificial Intelligence

The development of Natural Language Processing (NLP) tools for low-resource languages is critically hindered by the scarcity of annotated datasets. This paper addresses this fundamental challenge by introducing HausaMovieReview, a novel benchmark dataset comprising 5,000 YouTube comments in Hausa and code-switched English. The dataset was meticulously annotated by three independent annotators, demonstrating a robust agreement with a Fleiss' Kappa score of 0.85 between annotators. We used this dataset to conduct a comparative analysis of classical models (Logistic Regression, Decision Tree, K-Nearest Neighbors) and fine-tuned transformer models (BERT and RoBERTa). Our results reveal a key finding: the Decision Tree classifier, with an accuracy and F1-score 89.72% and 89.60% respectively, significantly outperformed the deep learning models. Our findings also provide a robust baseline, demonstrating that effective feature engineering can enable classical models to achieve state-of-the-art performance in low-resource contexts, thereby laying a solid foundation for future research. Keywords: Hausa, Kannywood, Low-Resource Languages, NLP, Sentiment Analysis

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Sep-23-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.48)
- Africa > Nigeria (0.30)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Media > Film (0.70)
- Leisure & Entertainment (0.69)

Technology:
- Information Technology
  - Communications > Social Media (1.00)
  - Artificial Intelligence
    - Natural Language
      - Information Extraction (1.00)
      - Discourse & Dialogue (1.00)
    - Machine Learning
      - Statistical Learning (1.00)
      - Neural Networks > Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found