EXPOTION: Facial Expression and Motion Control for Multimodal Music Generation