Advancing Conversational AI with Shona Slang: A Dataset and Hybrid Model for Digital Inclusion

Sep-19-2025–arXiv.org Artificial Intelligence

The proliferation of artificial intelligence (AI) systems, from virtual assistants [Kepuska and Bohouta, 2018] to recommendation engines [Gomez-Uribe and Hunt, 2015] and autonomous vehicles [Shladover, 2018], has reshaped human-machine interaction. Y et, African languages, with over 2,000 spoken across the continent [Eberhard et al., 2023], remain severely underrepresented in NLP due to their low-resource status [Ahia and Boakye, 2023, Nekoto et al., 2020]. This exclusion risks exacerbating the digital divide, limiting access to AI-driven services in critical domains like education, healthcare, and governance [Ndichu et al., 2024, Joshi et al., 2020]. Shona, a Bantu language spoken by millions in Zimbabwe and southern Zambia, exemplifies this challenge. Existing Shona corpora primarily consist of formal texts, such as news articles or religious documents [Eberhard et al., 2023], while everyday communication, particularly among younger speakers, is dominated by slang, code-mixing with English, and informal expressions [Eisenstein, 2013]. Standard NLP models, trained on formal data, struggle to process these dynamic linguistic patterns, hindering the development of culturally relevant conversational AI.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

Sep-19-2025

arXiv.org PDF

Add feedback

Country:
- Africa
  - Zimbabwe (0.25)
  - Zambia (0.25)

Genre:
- Research Report (0.40)

Industry:
- Health & Medicine (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Personal Assistant Systems (1.00)
  - Natural Language > Chatbot (1.00)
  - Machine Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found