MGE-LDM: Joint Latent Diffusion for Simultaneous Music Generation and Source Extraction
–Neural Information Processing Systems
Unlike prior approaches constrained to fixed instrument classes, MGE-LDM learns a joint distribution over full mixtures, submixtures, and individual stems within a single compact latent diffusion model. At inference, MGE-LDM enables (1) complete mixture generation, (2) partial generation (i.e., source imputation), and (3) textconditioned extraction of arbitrary sources. By formulating both separation and imputation as conditional inpainting tasks in the latent space, our approach supports flexible, class-agnostic manipulation of arbitrary instrument sources. Notably, MGE-LDM can be trained jointly across heterogeneous multi-track datasets (e.g., Slakh2100, MUSDB18, MoisesDB) without relying on predefined instrument categories. Audio samples are available at our project page .
Neural Information Processing Systems
Jun-18-2026, 22:22:46 GMT
- Genre:
- Research Report > Experimental Study (1.00)
- Overview (0.92)
- Industry:
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
- Technology: