Uncertainty as a Predictor: Leveraging Self-Supervised Learning for Zero-Shot MOS Prediction

Ravuri, Aditya, Cooper, Erica, Yamagishi, Junichi

Dec-25-2023–arXiv.org Machine Learning

This paper addresses the gap in We are particularly inspired by approaches in biology where efficient audio quality prediction, especially in low-resource zero-shot prediction is possible using a model's uncertainty settings where extensive MOS data from large-scale listening estimates, where uncertainties act as proxies for downstream tests may be unavailable. We demonstrate that uncertainty tasks [4]. Our main hypotheses are that, measures derived from out-of-the-box pretrained selfsupervised learning (SSL) models, such as wav2vec, correlate 1. uncertainty estimates can be derived from the outputs with MOS scores. These findings are based on data from the of SSL models such as wav2vec, and that, 2022 and 2023 VoiceMOS challenges. We explore the extent 2. these uncertainties can be used as proxies to MOS of this correlation across different models and language scores as high model uncertainty around the contents contexts, revealing insights into how inherent uncertainties in of an audio sequence must correspond to low audio SSL models can serve as effective proxies for audio quality quality.

large language model, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

Dec-25-2023

arXiv.org PDF

Add feedback

Country:
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.14)
- Asia > Japan
  - Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (0.74)
  - Machine Learning > Neural Networks (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found