Visual S: Sketching as a Visual Chain of Thought for Multimodal Language Models
–Neural Information Processing Systems
Humans draw to facilitate reasoning: we draw auxiliary lines when solving geometry problems; we mark and circle when reasoning on maps; we use sketches to amplify our ideas and relieve our limited-capacity working memory. However, such actions are missing in current multimodal language models (LMs). Current chain-of-thought and tool-use paradigms only use text as intermediate reasoning steps.
Neural Information Processing Systems
Mar-27-2025, 16:07:56 GMT
- Genre:
- Research Report > Experimental Study (0.93)
- Industry:
- Health & Medicine (0.66)
- Leisure & Entertainment > Games (0.46)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks
- Deep Learning (0.71)
- Natural Language
- Chatbot (0.96)
- Large Language Model (1.00)
- Representation & Reasoning (1.00)
- Vision (1.00)
- Machine Learning > Neural Networks
- Information Technology > Artificial Intelligence