Generating Separated Singing Vocals Using a Diffusion Model Conditioned on Music Mixtures

Plaja-Roglans, Genís, Hung, Yun-Ning, Serra, Xavier, Pereira, Igor

Nov-27-2025–arXiv.org Artificial Intelligence

Separating the individual elements in a musical mixture is an essential process for music analysis and practice. While this is generally addressed using neural networks optimized to mask or transform the time-frequency representation of a mixture to extract the target sources, the flexibility and generalization capabilities of generative diffusion models are giving rise to a novel class of solutions for this complicated task. In this work, we explore singing voice separation from real music recordings using a diffusion model which is trained to generate the solo vocals conditioned on the corresponding mixture. Our approach improves upon prior generative systems and achieves competitive objective scores against non-generative baselines when trained with supplementary data. The iterative nature of diffusion sampling enables the user to control the quality-efficiency trade-off, and also refine the output when needed. We present an ablation study of the sampling algorithm, highlighting the effects of the user-configurable parameters.

artificial intelligence, machine learning, separation, (15 more...)

arXiv.org Artificial Intelligence

Nov-27-2025

arXiv.org PDF

Add feedback

Country:
- Asia > India (0.29)
- Europe
  - Italy (0.28)
  - Austria (0.28)

Genre:
- Research Report (0.50)

Industry:
- Media (0.67)
- Leisure & Entertainment (0.67)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.90)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found