Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model