A weighted-variance variational autoencoder model for speech enhancement

Golmakani, Ali, Sadeghi, Mostafa, Alameda-Pineda, Xavier, Serizel, Romain

Oct-26-2023–arXiv.org Artificial Intelligence

We address speech enhancement based on variational autoencoders, which involves learning a speech prior distribution in the time-frequency (TF) domain. A zero-mean complex-valued Gaussian distribution is usually assumed for the generative model, where the speech information is encoded in the variance as a function of a latent variable. In contrast to this commonly used approach, we propose a weighted variance generative model, where the contribution of each spectrogram time-frame in parameter learning is weighted. We impose a Gamma prior distribution on the weights, which would effectively lead to a Student's t-distribution instead of Gaussian for speech generative modeling. We develop efficient training and speech enhancement algorithms based on the proposed generative model. Our experimental results on spectrogram auto-encoding and speech enhancement demonstrate the effectiveness and robustness of the proposed approach compared to the standard unweighted variance model.

artificial intelligence, machine learning, speech enhancement, (15 more...)

arXiv.org Artificial Intelligence

Oct-26-2023

arXiv.org PDF

Add feedback

Country:
- Europe > France (0.29)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found