A variance modeling framework based on variational autoencoders for speech enhancement

Leglaive, Simon, Girin, Laurent, Horaud, Radu

arXiv.org Machine Learning 

Grenoble Alpes, Grenoble INP, GIPSA-lab, France ABSTRACT In this paper we address the problem of enhancing speech signals in noisy mixtures using a source separation approach. We explore the use of neural networks as an alternative to a popular speech variance model based on supervised nonnegative matrix factorization (NMF). More precisely, we use a variational autoencoder as a speaker-independent supervised generative speech model, highlighting the conceptual similarities that this approach shares with its NMF-based counterpart. In order to be free of generalization issues regarding the noisy recording environments, we follow the approach of having a supervised model only for the target speech signal, the noise model being based on unsupervised NMF. We develop a Monte Carlo expectation-maximization algorithm for inferring the latent variables in the variational autoencoder and estimating the unsupervised model parameters. Experiments show that the proposed method outperforms a semi-supervised NMF baseline and a state-of- the-art fully supervised deep learning approach. Index Terms-- Audio source separation, speech enhancement, variational autoencoders, nonnegative matrix factorization, Monte Carlo expectation-maximization 1. INTRODUCTION Speech enhancement is a classical problem of speech processing, which aims to recover a clean speech signal from the recording of a noisy signal, where the noise is generally considered as additive [1].

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found