Reminding Multimodal Large Language Models of Object-aware Knowledge with Retrieved Tags

Open in new window