Referencing Where to Focus: Improving Visual Grounding with Referential Query Yabing Wang