Unified Diffusion VLA: Vision-Language-Action Model via Joint Discrete Denoising Diffusion Process

Open in new window