VARCO-VISION: Expanding Frontiers in Korean Vision-Language Models
Ju, Jeongho, Kim, Daeyoung, Park, SunYoung, Kim, Youngjune
–arXiv.org Artificial Intelligence
In this paper, we introduce an open-source Korean-English vision-language model (VLM), VARCO-VISION. We incorporate a step-by-step training strategy that allows a model learn both linguistic and visual information while preserving the backbone model's knowledge. Our model demonstrates outstanding performance in diverse settings requiring bilingual image-text understanding and generation abilities compared to models of similar size. VARCO-VISION is also capable of grounding, referring, and OCR, expanding its usage and potential applications for real-world scenarios. In addition to the model, we release five Korean evaluation datasets, including four closed-set and one openset benchmarks. We anticipate that our milestone will broaden the opportunities for AI researchers aiming to train VLMs. VARCO-VISION is available at https://huggingface.co/NCSOFT/VARCO-VISION-14B.
arXiv.org Artificial Intelligence
Nov-28-2024
- Country:
- North America > Mexico
- Mexico City > Mexico City (0.04)
- Europe
- United Kingdom (0.04)
- Ireland (0.04)
- Asia
- South Korea (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- North America > Mexico
- Genre:
- Research Report (0.82)
- Industry:
- Leisure & Entertainment (0.68)
- Consumer Products & Services > Restaurants (0.46)
- Technology: