Improving Fine-grained Visual Understanding in VLMs through Text-Only Training

Open in new window