Evo-0: Vision-Language-Action Model with Implicit Spatial Understanding

Open in new window