CLIPDraw: Exploring Text-to-Drawing Synthesisthrough Language-Image Encoders

Apr-25-2026, 01:52:22 GMT–Neural Information Processing Systems

CLIPDraw is an algorithm that synthesizes novel drawings from natural language input. It does not require any additional training; rather, a pre-trained CLIP language-image encoder is used as a metric for maximizing similarity between the given description and a generated drawing. Crucially, CLIPDraw operates over vector strokes rather than pixel images, which biases drawings towards simpler human-recognizable shapes. Results compare CLIPDraw with other synthesisthrough-optimization methods, as well as highlight various interesting behaviors of CLIPDraw, such as satisfying ambiguous text in multiple ways, reliably producing drawings in diverse styles, and scaling from simple to complex visual representations as stroke count increases.

clipdraw, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Apr-25-2026, 01:52:22 GMT

Conferences PDF

Add feedback

Country:
- North America > United States
  - Massachusetts (0.28)
- Asia > Japan
  - Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.15)

Industry:
- Information Technology (0.69)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Machine Learning > Neural Networks (1.00)
  - Natural Language (0.90)
  - Representation & Reasoning (0.67)

Duplicate Docs Excel Report

Title
21f76686538a5f06dc431efea5f475f5-Paper-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found