MoMu-Diffusion: On Learning Long-Term Motion-Music Synchronization and Correspondence

May-27-2025, 20:04:48 GMT–Neural Information Processing Systems

Motion-to-music and music-to-motion have been studied separately, each attracting substantial research interest within their respective domains. The interaction between human motion and music is a reflection of advanced human intelligence, and establishing a unified relationship between them is particularly important. However, to date, there has been no work that considers them jointly to explore the modality alignment within. To bridge this gap, we propose a novel framework, termed MoMu-Diffusion, for long-term and synchronous motion-music generation. Firstly, to mitigate the huge computational costs raised by long sequences, we propose a novel Bidirectional Contrastive Rhythmic Variational Auto-Encoder (BiCoR-VAE) that extracts the modality-aligned latent representations for both motion and music inputs.

learning long-term motion-music synchronization, long-term motion-music synchronization and correspondence, momu-diffusion, (1 more...)

Neural Information Processing Systems

May-27-2025, 20:04:48 GMT

Conferences Web Page

Add feedback

Industry:
- Media > Music (0.62)
- Leisure & Entertainment (0.62)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.42)