Voxlect: A Speech Foundation Model Benchmark for Modeling Dialects and Regional Languages Around the Globe
Feng, Tiantian, Huang, Kevin, Xu, Anfeng, Shi, Xuan, Lertpetchpun, Thanathai, Lee, Jihwan, Lee, Yoonjeong, Byrd, Dani, Narayanan, Shrikanth
–arXiv.org Artificial Intelligence
Specifically, we report comprehensive benchmark evaluations on dialects and regional language varieties in English, Arabic, Mandarin and Cantonese, Tibetan, Indic languages, Thai, Spanish, French, German, Brazilian Portuguese, and Italian. Our study used over 2 million training utterances from 30 publicly available speech corpora that are provided with dialectal information. We evaluate the performance of several widely used speech foundation models in classifying speech dialects. We assess the robustness of the dialectal models under noisy conditions and present an error analysis that highlights modeling results aligned with geographic continuity. In addition to benchmarking dialect classification, we demonstrate several downstream applications enabled by Voxlect . Specifically, we show that Voxlect can be applied to augment existing speech recognition datasets with dialect information, enabling a more detailed analysis of ASR performance across dialectal variations. Voxlect is also used as a tool to evaluate the performance of speech generation systems.
arXiv.org Artificial Intelligence
Aug-5-2025
- Country:
- Africa
- Middle East > Morocco (0.04)
- South Africa (0.04)
- Asia
- Nepal (0.04)
- East Asia (0.04)
- Bhutan (0.04)
- Middle East
- Lebanon (0.04)
- Saudi Arabia (0.04)
- China
- Beijing > Beijing (0.05)
- Henan Province > Zhengzhou (0.04)
- Shanghai > Shanghai (0.04)
- Tianjin Province > Tianjin (0.04)
- Tibet Autonomous Region (0.04)
- South Korea > Seoul
- Seoul (0.04)
- Southeast Asia (0.04)
- Thailand
- Bangkok > Bangkok (0.04)
- Chiang Mai > Chiang Mai (0.04)
- Pattani > Pattani (0.04)
- India (0.05)
- Europe
- Austria (0.04)
- Belgium (0.04)
- France (0.05)
- Germany > North Rhine-Westphalia (0.04)
- Ireland (0.04)
- Switzerland (0.04)
- United Kingdom
- England > Cambridgeshire
- Cambridge (0.04)
- Northern Ireland (0.14)
- Scotland (0.04)
- Wales (0.04)
- England > Cambridgeshire
- North America
- Canada > Quebec (0.04)
- Central America (0.14)
- Costa Rica (0.04)
- United States > California
- Los Angeles County > Los Angeles (0.28)
- Oceania (0.04)
- South America
- Argentina (0.04)
- Brazil
- Minas Gerais (0.04)
- Pernambuco > Recife (0.04)
- São Paulo (0.04)
- Chile (0.04)
- Peru (0.04)
- Africa
- Genre:
- Research Report > New Finding (0.93)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning (1.00)
- Natural Language (1.00)
- Speech > Speech Recognition (1.00)
- Information Technology > Artificial Intelligence