Voice of a Continent: Mapping Africa's Speech Technology Frontier
Elmadany, AbdelRahim, Kwon, Sang Yun, Toyin, Hawau Olamide, Inciarte, Alcides Alcoba, Aldarmaki, Hanan, Abdul-Mageed, Muhammad
–arXiv.org Artificial Intelligence
Africa's rich linguistic diversity remains significantly underrepresented in speech technologies, creating barriers to digital inclusion. To alleviate this challenge, we systematically map the continent's speech space of datasets and technologies, leading to a new comprehensive benchmark SimbaBench for downstream African speech tasks. Using SimbaBench, we introduce the Simba family of models, achieving state-of-the-art performance across multiple African languages and speech tasks. Our benchmark analysis reveals critical patterns in resource availability, while our model evaluation demonstrates how dataset quality, domain diversity, and language family relationships influence performance across languages. Our work highlights the need for expanded speech technology resources that better reflect Africa's linguistic diversity and provides a solid foundation for future research and development efforts toward more inclusive speech technologies.
arXiv.org Artificial Intelligence
Jul-8-2025
- Country:
- Africa
- Ghana (0.04)
- Kenya (0.04)
- Niger (0.04)
- Senegal
- Dakar Region > Dakar (0.04)
- Thiès Region > Thiès (0.04)
- South Africa (0.04)
- Tanzania (0.04)
- West Africa (0.04)
- Zambia (0.04)
- Asia
- China > Shanghai
- Shanghai (0.04)
- Indonesia > Bali (0.04)
- Japan > Kyūshū & Okinawa
- Kyūshū > Miyazaki Prefecture > Miyazaki (0.04)
- Middle East
- Israel (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- Russia (0.04)
- Singapore (0.04)
- China > Shanghai
- Europe
- Czechia > South Moravian Region
- Brno (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Russia > Northwestern Federal District
- Leningrad Oblast > Saint Petersburg (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Sweden > Stockholm
- Stockholm (0.04)
- Czechia > South Moravian Region
- North America
- Canada > British Columbia (0.04)
- United States (0.05)
- Oceania > Tonga (0.04)
- Africa
- Genre:
- Overview (1.00)
- Research Report > New Finding (0.46)
- Industry:
- Leisure & Entertainment (1.00)
- Media > Radio (0.67)
- Technology:
- Information Technology
- Artificial Intelligence
- Machine Learning (1.00)
- Natural Language (1.00)
- Representation & Reasoning (0.67)
- Speech > Speech Recognition (1.00)
- Data Science (0.93)
- Artificial Intelligence
- Information Technology