Making Large Multimodal Models Understand Arbitrary Visual Prompts

Open in new window