MuDiT & MuSiT: Alignment with Colloquial Expression in Description-to-Song Generation