InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot Interactions

Open in new window