Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens

Open in new window