Unveiling Chain of Step Reasoning for Vision-Language Models with Fine-grained Rewards

Open in new window