ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling

Open in new window