IRef-VLA: A Benchmark for Interactive Referential Grounding with Imperfect Language in 3D Scenes

Open in new window