Unveiling the Compositional Ability Gap in Vision-Language Reasoning Model

Open in new window