ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning

Open in new window