4D-VLA: Spatiotemporal Vision-Language-Action Pretraining with Cross-Scene Calibration

Open in new window