Investigating Pre-trained Audio Encoders in the Low-Resource Condition
Yang, Hao, Zhao, Jinming, Haffari, Gholamreza, Shareghi, Ehsan
–arXiv.org Artificial Intelligence
To better understand the interplay between pre-training protocols of speech encoders, the amount of fine-tuning data, and Pre-trained speech encoders have been central to pushing stateof-the-art speech task types, we conduct a comprehensive study in this results across various speech understanding and generation work. We evaluate a set of three very recent speech models tasks. Nonetheless, the capabilities of these encoders in (Wav2vec2, WavLM, and Whisper) and assess their performance low-resource settings are yet to be thoroughly explored. To address on 7 downstream tasks (covering content, speaker and this, we conduct a comprehensive set of experiments using semantic types) in the low-resource setting. Through extensive a representative set of 3 state-of-the-art encoders (Wav2vec2, experiments in the low-resource setting, we found that Whisper WavLM, Whisper) in the low-resource setting across 7 speech significantly outperforms Wav2vec2 and WavLM by a large understanding and generation tasks. We provide various quantitative margin on content-related (content, semantics) tasks, and shows and qualitative analyses on task performance, convergence performance degradation when speaker information is required speed, and representational properties of the encoders.
arXiv.org Artificial Intelligence
May-28-2023
- Country:
- Asia > Middle East
- UAE (0.14)
- Europe > Czechia (0.14)
- Asia > Middle East
- Genre:
- Research Report (0.50)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning (1.00)
- Natural Language (1.00)
- Speech > Speech Recognition (1.00)
- Information Technology > Artificial Intelligence