T-Rex: Task-Adaptive Spatial Representation Extraction for Robotic Manipulation with Vision-Language Models

Open in new window