Discrete Cross-Modal Alignment Enables Zero-Shot Speech Translation

Open in new window