Mantis: A Versatile Vision-Language-Action Model with Disentangled Visual Foresight

Open in new window