Qilin-Med-VL: Towards Chinese Large Vision-Language Model for General Healthcare

Liu, Junling, Wang, Ziming, Ye, Qichen, Chong, Dading, Zhou, Peilin, Hua, Yining

Nov-1-2023–arXiv.org Artificial Intelligence

Large Language Models (LLMs) have introduced a new era of proficiency in comprehending complex healthcare and biomedical topics. However, there is a noticeable lack of models in languages other than English and models that can interpret multi-modal input, which is crucial for global healthcare accessibility. In response, this study introduces Qilin-Med-VL, the first Chinese large vision-language model designed to integrate the analysis of textual and visual data. Qilin-Med-VL combines a pre-trained Vision Transformer (ViT) with a foundational LLM. It undergoes a thorough two-stage curriculum training process that includes feature alignment and instruction tuning. This method enhances the model's ability to generate medical captions and answer complex medical queries. We also release ChiMed-VL, a dataset consisting of more than 1M image-text pairs. This dataset has been carefully curated to enable detailed and comprehensive interpretation of medical data using various types of images.

general healthcare, qilin-med-vl, vision-language model

arXiv.org Artificial Intelligence

Nov-1-2023

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.40)

Industry:
- Health & Medicine (1.00)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.73)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found