Advancing Conversational AI with Shona Slang: A Dataset and Hybrid Model for Digital Inclusion

Masoka, Happymore

arXiv.org Artificial Intelligence 

The proliferation of artificial intelligence (AI) systems, from virtual assistants [Kepuska and Bohouta, 2018] to recommendation engines [Gomez-Uribe and Hunt, 2015] and autonomous vehicles [Shladover, 2018], has reshaped human-machine interaction. Y et, African languages, with over 2,000 spoken across the continent [Eberhard et al., 2023], remain severely underrepresented in NLP due to their low-resource status [Ahia and Boakye, 2023, Nekoto et al., 2020]. This exclusion risks exacerbating the digital divide, limiting access to AI-driven services in critical domains like education, healthcare, and governance [Ndichu et al., 2024, Joshi et al., 2020]. Shona, a Bantu language spoken by millions in Zimbabwe and southern Zambia, exemplifies this challenge. Existing Shona corpora primarily consist of formal texts, such as news articles or religious documents [Eberhard et al., 2023], while everyday communication, particularly among younger speakers, is dominated by slang, code-mixing with English, and informal expressions [Eisenstein, 2013]. Standard NLP models, trained on formal data, struggle to process these dynamic linguistic patterns, hindering the development of culturally relevant conversational AI.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found