Cross-Modal Instructions for Robot Motion Generation