Cross-Lingual Representation Alignment Through Contrastive Image-Caption Tuning

Krasner, Nathaniel, Lanuzo, Nicholas, Anastasopoulos, Antonios

May-21-2025–arXiv.org Artificial Intelligence

Multilingual alignment of sentence representations has mostly required bitexts to bridge the gap between languages. We investigate whether visual information can bridge this gap instead. Image caption datasets are very easy to create without requiring multilingual expertise, so this offers a more efficient alternative for low-resource languages. We find that multilingual image-caption alignment can implicitly align the text representations between languages, languages unseen by the encoder in pretraining can be incorporated into this alignment post-hoc, and these aligned representations are usable for cross-lingual Natural Language Understanding (NLU) and bitext retrieval.

artificial intelligence, natural language, text processing, (18 more...)

arXiv.org Artificial Intelligence

May-21-2025

arXiv.org PDF

Add feedback

Country:
- Europe (1.00)
- Asia (0.68)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.35)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found