Unsupervised speech enhancement with deep dynamical generative speech and noise models
Lin, Xiaoyu, Leglaive, Simon, Girin, Laurent, Alameda-Pineda, Xavier
–arXiv.org Artificial Intelligence
ND methods use noise or noisy speech enhancement using a dynamical variational autoencoder speech training samples to learn some noise characteristics. In (DVAE) as the clean speech model and non-negative matrix factorization contrast, NA methods only use clean speech signals for training (NMF) as the noise model. We propose to replace and the noise characteristics are estimated at test time for the NMF noise model with a deep dynamical generative model each noisy speech sequence to process. A typical unsupervised (DDGM) depending either on the DVAE latent variables, or on NA approach uses a pre-trained variational autoencoder (VAE) the noisy observations, or on both. This DDGM can be trained as a prior distribution of the clean speech signal and a nonnegative in three configurations: noise-agnostic, noise-dependent and matrix factorization (NMF) model for the noise variance noise adaptation after noise-dependent training.
arXiv.org Artificial Intelligence
Jun-13-2023
- Country:
- Asia (0.68)
- Europe (1.00)
- North America
- Canada (0.28)
- United States > California (0.14)
- Genre:
- Research Report (0.50)
- Technology: