Scalable Vision Language Model Training via High Quality Data Curation