CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models

Open in new window