Multimodal LLMs Do Not Compose Skills Optimally Across Modalities

Open in new window