Investigating Pre-trained Audio Encoders in the Low-Resource Condition

Yang, Hao, Zhao, Jinming, Haffari, Gholamreza, Shareghi, Ehsan

May-28-2023–arXiv.org Artificial Intelligence

To better understand the interplay between pre-training protocols of speech encoders, the amount of fine-tuning data, and Pre-trained speech encoders have been central to pushing stateof-the-art speech task types, we conduct a comprehensive study in this results across various speech understanding and generation work. We evaluate a set of three very recent speech models tasks. Nonetheless, the capabilities of these encoders in (Wav2vec2, WavLM, and Whisper) and assess their performance low-resource settings are yet to be thoroughly explored. To address on 7 downstream tasks (covering content, speaker and this, we conduct a comprehensive set of experiments using semantic types) in the low-resource setting. Through extensive a representative set of 3 state-of-the-art encoders (Wav2vec2, experiments in the low-resource setting, we found that Whisper WavLM, Whisper) in the low-resource setting across 7 speech significantly outperforms Wav2vec2 and WavLM by a large understanding and generation tasks. We provide various quantitative margin on content-related (content, semantics) tasks, and shows and qualitative analyses on task performance, convergence performance degradation when speaker information is required speed, and representational properties of the encoders.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

May-28-2023

arXiv.org PDF

Add feedback

Country:
- Asia > Middle East
  - UAE (0.14)
- Europe > Czechia (0.14)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language (1.00)
  - Speech > Speech Recognition (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found