villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models

Open in new window