ProVision: Programmatically Scaling Vision-centric Instruction Data for Multimodal Language Models