A Lightweight Large Vision-language Model for Multimodal Medical Images