CLIPDraw: Exploring Text-to-Drawing Synthesisthrough Language-Image Encoders
–Neural Information Processing Systems
CLIPDraw is an algorithm that synthesizes novel drawings from natural language input. It does not require any additional training; rather, a pre-trained CLIP language-image encoder is used as a metric for maximizing similarity between the given description and a generated drawing. Crucially, CLIPDraw operates over vector strokes rather than pixel images, which biases drawings towards simpler human-recognizable shapes. Results compare CLIPDraw with other synthesisthrough-optimization methods, as well as highlight various interesting behaviors of CLIPDraw, such as satisfying ambiguous text in multiple ways, reliably producing drawings in diverse styles, and scaling from simple to complex visual representations as stroke count increases.
Neural Information Processing Systems
Apr-25-2026, 01:52:22 GMT
- Country:
- North America > United States
- Massachusetts (0.28)
- Asia > Japan
- Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.15)
- North America > United States
- Industry:
- Information Technology (0.69)
- Technology:
- Information Technology > Artificial Intelligence
- Vision (1.00)
- Machine Learning > Neural Networks (1.00)
- Natural Language (0.90)
- Representation & Reasoning (0.67)
- Information Technology > Artificial Intelligence