Evaluating the Capabilities of Multi-modal Reasoning Models with Synthetic Task Data

Open in new window