Phantom: Subject-consistent video generation via cross-modal alignment

Open in new window