Unveiling Chain of Step Reasoning for Vision-Language Models with Fine-grained Rewards
–Neural Information Processing Systems
Chain of thought reasoning has demonstrated remarkable success in large language models, yet its adaptation to vision-language reasoning remains an open challenge with unclear best practices. Existing attempts typically employ reasoning chains at a coarse-grained level, which struggles to perform fine-grained structured reasoning and, more importantly, are difficult to evaluate the reward and quality of intermediate reasoning.
Neural Information Processing Systems
Jun-13-2026, 17:06:51 GMT
- Technology: