Uncertainty as a Predictor: Leveraging Self-Supervised Learning for Zero-Shot MOS Prediction

Ravuri, Aditya, Cooper, Erica, Yamagishi, Junichi

arXiv.org Machine Learning 

This paper addresses the gap in We are particularly inspired by approaches in biology where efficient audio quality prediction, especially in low-resource zero-shot prediction is possible using a model's uncertainty settings where extensive MOS data from large-scale listening estimates, where uncertainties act as proxies for downstream tests may be unavailable. We demonstrate that uncertainty tasks [4]. Our main hypotheses are that, measures derived from out-of-the-box pretrained selfsupervised learning (SSL) models, such as wav2vec, correlate 1. uncertainty estimates can be derived from the outputs with MOS scores. These findings are based on data from the of SSL models such as wav2vec, and that, 2022 and 2023 VoiceMOS challenges. We explore the extent 2. these uncertainties can be used as proxies to MOS of this correlation across different models and language scores as high model uncertainty around the contents contexts, revealing insights into how inherent uncertainties in of an audio sequence must correspond to low audio SSL models can serve as effective proxies for audio quality quality.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found