RetoVLA: Reusing Register Tokens for Spatial Reasoning in Vision-Language-Action Models

Open in new window