Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs

Open in new window