Paint Outside the Box: Synthesizing and Selecting Training Data for Visual Grounding

Open in new window