MoMu-Diffusion: On Learning Long-Term Motion-Music Synchronization and Correspondence