Closing the Gap: Data-Centric Fine-Tuning of Vision Language Models for the Standardized Exam Questions