Language Model Mapping in Multimodal Music Learning: A Grand Challenge Proposal