IPA-CLIP: Integrating Phonetic Priors into Vision and Language Pretraining

Open in new window