Elevating Visual Perception in Multimodal LLMs with Visual Embedding Distillation

Open in new window