A Generalization Theory of Cross-Modality Distillation with Contrastive Learning