AITopics | diffusion-based speech enhancement

Collaborating Authors

diffusion-based speech enhancement

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Diffusion-based speech enhancement with a weighted generative-supervised learning loss

Ayilo, Jean-Eudes, Sadeghi, Mostafa, Serizel, Romain

arXiv.org Machine LearningSep-19-2023

Diffusion-based generative models have recently gained attention in speech enhancement (SE), providing an alternative to conventional supervised methods. These models transform clean speech training samples into Gaussian noise centered at noisy speech, and subsequently learn a parameterized model to reverse this process, conditionally on noisy speech. Unlike supervised methods, generative-based SE approaches usually rely solely on an unsupervised loss, which may result in less efficient incorporation of conditioned noisy speech. To address this issue, we propose augmenting the original diffusion training objective with a mean squared error (MSE) loss, measuring the discrepancy between estimated enhanced speech and ground-truth clean speech at each reverse process iteration. Experimental results demonstrate the effectiveness of our proposed methodology.

diffusion-based speech enhancement, weighted generative-supervised learning loss

arXiv.org Machine Learning

2309.10457

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.40)

Add feedback

Diffusion-Based Speech Enhancement with Joint Generative and Predictive Decoders

Shi, Hao, Shimada, Kazuki, Hirano, Masato, Shibuya, Takashi, Koyama, Yuichiro, Zhong, Zhi, Takahashi, Shusuke, Kawahara, Tatsuya, Mitsufuji, Yuki

arXiv.org Artificial IntelligenceMay-18-2023

Diffusion-based speech enhancement (SE) has been investigated recently, but its decoding is very time-consuming. One solution is to initialize the decoding process with the enhanced feature estimated by a predictive SE system. However, this two-stage method ignores the complementarity between predictive and diffusion SE. In this paper, we propose a unified system that integrates these two SE modules. The system encodes both generative and predictive information, and then applies both generative and predictive decoders, whose outputs are fused. Specifically, the two SE modules are fused in the first and final diffusion steps: the first step fusion initializes the diffusion process with the predictive SE for improving the convergence, and the final step fusion combines the two complementary SE outputs to improve the SE performance. Experiments on the Voice-Bank dataset show that the diffusion score estimation can benefit from the predictive information and speed up the decoding.

artificial intelligence, diffusion step, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2305.10734

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.05)

Genre: Workflow (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback