Improving Vision-Language-Action Models via Chain-of-Affordance

Open in new window