LiteGPT: Large Vision-Language Model for Joint Chest X-ray Localization and Classification Task
Le-Duc, Khai, Zhang, Ryan, Nguyen, Ngoc Son, Pham, Tan-Hanh, Dao, Anh, Ngo, Ba Hung, Nguyen, Anh Totti, Hy, Truong-Son
–arXiv.org Artificial Intelligence
Vision-language models have been extensively explored across a wide range of tasks, achieving satisfactory performance; however, their application in medical imaging remains underexplored. In this work, we propose a unified framework - LiteGPT - for the medical imaging. We leverage multiple pre-trained visual encoders to enrich information and enhance the performance of vision-language models. To the best of our knowledge, this is the first study to utilize vision-language models for the novel task of joint localization and classification in medical images. Besides, we are pioneers in providing baselines for disease localization in chest X-rays. Finally, we set new state-of-the-art performance in the image classification task on the well-benchmarked VinDr-CXR dataset. All code and models are publicly available online: https://github.com/leduckhai/LiteGPT
arXiv.org Artificial Intelligence
Jul-15-2024
- Country:
- North America
- Europe
- Ireland (0.04)
- Netherlands > North Holland
- Amsterdam (0.04)
- Croatia > Dubrovnik-Neretva County
- Dubrovnik (0.04)
- Asia
- South Korea (0.04)
- China (0.04)
- Vietnam > Hanoi
- Hanoi (0.04)
- Middle East > Qatar
- Genre:
- Research Report > New Finding (0.67)
- Industry:
- Health & Medicine
- Therapeutic Area (1.00)
- Nuclear Medicine (1.00)
- Health Care Technology (1.00)
- Diagnostic Medicine > Imaging (1.00)
- Health & Medicine
- Technology: