Behind Maya: Building a Multilingual Vision Language Model
Alam, Nahid, Kanjula, Karthik Reddy, Guthikonda, Surya, Chung, Timothy, Vegesna, Bala Krishna S, Das, Abhipsha, Susevski, Anthony, Chan, Ryan Sze-Yin, Uddin, S M Iftekhar, Islam, Shayekh Bin, Santhosh, Roshan, A, Snegha, Sharma, Drishti, Liu, Chen, Chaturvedi, Isha, Winata, Genta Indra, S, Ashvanth., Mukherjee, Snehanshu, Aji, Alham Fikri
–arXiv.org Artificial Intelligence
In recent times, we have seen a rapid development of large Vision-Language Models (VLMs). They have shown impressive results on academic benchmarks, primarily in widely spoken languages but lack performance on low-resource languages and varied cultural contexts. T o address these limitations, we introduce Maya, an open-source Multilingual VLM. Our contributions are: 1) a multilingual image-text pretraining dataset in eight languages, based on the LLaVA pretraining dataset; and 2) a multilingual image-text model supporting these languages, enhancing cultural and linguistic comprehension in vision-language tasks.
arXiv.org Artificial Intelligence
May-16-2025
- Country:
- Asia
- Bangladesh (0.04)
- India > Jharkhand
- Dhanbad (0.04)
- Europe
- Germany
- Berlin (0.04)
- Hesse > Darmstadt Region
- Darmstadt (0.04)
- Switzerland > Zürich
- Zürich (0.14)
- Germany
- North America > United States
- Indiana (0.04)
- Pennsylvania (0.04)
- Asia
- Genre:
- Research Report (0.40)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning (1.00)
- Natural Language > Large Language Model (0.97)
- Vision (1.00)
- Information Technology > Artificial Intelligence