Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment