Investigating the Effect of Parallel Data in the Cross-Lingual Transfer for Vision-Language Encoders

Manea, Andrei-Alexandru, Libovický, Jindřich

Aug-18-2025–arXiv.org Artificial Intelligence

Most pre-trained Vision-Language (VL) models and training data for the downstream tasks are only available in English. Therefore, multilingual VL tasks are solved using cross-lingual transfer: fine-tune a multilingual pre-trained model or transfer the text encoder using parallel data. We study the alternative approach: transferring an already trained encoder using parallel data. We investigate the effect of parallel data: domain and the number of languages, which were out of focus in previous work. Our results show that even machine-translated task data are the best on average, caption-like authentic parallel data outperformed it in some languages. Further, we show that most languages benefit from multilingual training.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Aug-18-2025

arXiv.org PDF

Add feedback

Country:
- Europe (1.00)
- Asia (1.00)
- North America > United States
  - Minnesota (0.28)

Genre:
- Research Report > New Finding (0.86)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language > Machine Translation (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found