On the Compositional Generalization of Multimodal LLMs for Medical Imaging