Learning from Models and Data for Visual Grounding