Metric-oriented Speech Enhancement using Diffusion Probabilistic Model
Chen, Chen, Hu, Yuchen, Weng, Weiwei, Chng, Eng Siong
–arXiv.org Artificial Intelligence
Deep neural network based speech enhancement technique focuses on learning a noisy-to-clean transformation supervised by paired training data. However, the task-specific evaluation metric (e.g., PESQ) is usually non-differentiable and can not be directly constructed in the training criteria. This mismatch between the training objective and evaluation metric likely results in sub-optimal performance. To alleviate it, we propose a metric-oriented speech enhancement method (MOSE), which leverages the recent advances in the diffusion probabilistic model and integrates a metric-oriented training strategy into its reverse process. Specifically, we design an actor-critic based framework that considers the evaluation metric as a posterior reward, thus guiding the reverse process to the metric-increasing direction. The experimental results demonstrate that MOSE obviously benefits from metric-oriented training and surpasses the generative baselines in terms of all evaluation metrics.
arXiv.org Artificial Intelligence
Feb-23-2023
- Country:
- North America > United States (0.14)
- Asia > Singapore (0.04)
- Europe > Italy
- Calabria > Catanzaro Province > Catanzaro (0.04)
- Genre:
- Research Report (0.84)
- Industry:
- Government (0.46)
- Technology: