Scaling HuBERT for African Languages: From Base to Large and XL
Caubrière, Antoine, Gauthier, Elodie
–arXiv.org Artificial Intelligence
Despite recent progress in multilingual speech processing, African languages remain under-represented in both research and deployed systems, particularly when it comes to strong, open-weight encoders that transfer well under low-resource supervision. Self-supervised learning has proven especially promising in such settings, yet most publicly released models targeting African speech remain at BASE scale, leaving unanswered whether larger encoders, trained exclusively on Africa-centric audio, offer tangible benefits and how model capacity interacts with data composition. This work addresses that gap by introducing SSA-HuBERT-Large (317M parameters) and SSA-HuBERT-XL (964M parameters), the first large models trained solely on African speech, alongside a BASE size counterpart. We release these models as open weights: see https://huggingface.co/collections/Orange/african-speech-foundation-models. By conducting a carefully controlled experimental study focused exclusively on Sub-Saharan languages, covering automatic speech recognition (ASR) and language identification (LID) tasks, we demonstrate that larger architectures significantly improve performance by effectively leveraging large audio datasets.
arXiv.org Artificial Intelligence
Dec-1-2025
- Country:
- Africa (0.26)
- Europe > France (0.05)
- North America > United States (0.05)
- Genre:
- Research Report
- Experimental Study (1.00)
- Strength High (1.00)
- Research Report
- Technology: