Leveraging Perfect Multimodal Alignment and Gaussian Assumptions for Cross-modal Transfer