Bridging vision language model (VLM) evaluation gaps with a framework for scalable and cost-effective benchmark generation