Paint Outside the Box: Synthesizing and Selecting Training Data for Visual Grounding