Rephrase, Augment, Reason: Visual Grounding of Questions for Vision-Language Models

Open in new window