Referencing Where to Focus: Improving Visual Grounding with Referential Query Y abing Wang