Self-T aught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models
–Neural Information Processing Systems
We propose an unsupervised adaptation framework, Self-T Aught Recognizer (ST AR), which leverages unlabeled data to enhance the robustness of automatic speech recognition (ASR) systems in diverse target domains, such as noise and accents. ST AR is developed for prevalent speech foundation models based on Transformer-related architecture with auto-regressive decoding (e.g., Whisper, Canary; SeamlessM4T).
Neural Information Processing Systems
Oct-9-2025, 22:55:14 GMT
- Country:
- Asia > Singapore (0.04)
- Europe > Czechia
- Prague (0.04)
- North America > United States
- Oregon > Lane County > Eugene (0.04)
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (0.93)
- Research Report
- Industry:
- Education (0.93)
- Information Technology (1.00)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks
- Deep Learning (0.67)
- Natural Language (1.00)
- Representation & Reasoning (0.93)
- Speech > Speech Recognition (1.00)
- Vision (0.93)
- Machine Learning > Neural Networks
- Information Technology > Artificial Intelligence