Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge

Open in new window