Testing the limits of fine-tuning to improve reasoning in vision language models

Open in new window