X-VILA: Cross-Modality Alignment for Large Language Model

Open in new window