Breaking the Transcription Bottleneck: Fine-tuning ASR Models for Extremely Low-Resource Fieldwork Languages
–arXiv.org Artificial Intelligence
Automatic Speech Recognition (ASR) has reached impressive accuracy for high-resource languages, yet its utility in linguistic fieldwork remains limited. Recordings collected in fieldwork contexts present unique challenges, including spontaneous speech, environmental noise, and severely constrained datasets from under-documented languages. In this paper, we benchmark the performance of two fine-tuned multilingual ASR models, MMS and XLS-R, on five typologically diverse low-resource languages with control of training data duration. Our findings show that MMS is best suited when extremely small amounts of training data are available, whereas XLS-R shows parity performance once training data exceed one hour. We provide linguistically grounded analysis for further provide insights towards practical guidelines for field linguists, highlighting reproducible ASR adaptation approaches to mitigate the transcription bottleneck in language documentation.
arXiv.org Artificial Intelligence
Jun-25-2025
- Country:
- Africa
- Asia
- Indonesia > Sulawesi
- North Sulawesi > Manado (0.04)
- Russia > Far Eastern Federal District
- Sakhalin Oblast > Sakhalin Island (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- Indonesia > Sulawesi
- Europe
- Croatia > Dubrovnik-Neretva County
- Dubrovnik (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Romania > Sud - Muntenia Development Region
- Giurgiu County > Giurgiu (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Croatia > Dubrovnik-Neretva County
- North America
- Oceania > Papua New Guinea (0.04)
- South America > Ecuador (0.04)
- Genre:
- Research Report > New Finding (1.00)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning (1.00)
- Natural Language (1.00)
- Speech > Speech Recognition (1.00)
- Information Technology > Artificial Intelligence