CLIP-Forge: Towards Zero-Shot Text-to-Shape Generation

Sanghi, Aditya, Chu, Hang, Lambourne, Joseph G., Wang, Ye, Cheng, Chin-Yi, Fumero, Marco

Oct-6-2021–arXiv.org Artificial Intelligence

While recent progress has been made in text-to-image generation, text-to-shape generation remains a challenging problem due to the unavailability of paired text and shape data at a large scale. We present a simple yet effective method for zeroshot text-to-shape generation based on a two-stage training process, which only depends on an unlabelled shape dataset and a pre-trained image-text network such as CLIP. Our method not only demonstrates promising zero-shot generalization, but also avoids expensive inference time optimization and can generate multiple shapes for a given text. "a cuboid sofa" "a round sofa" "an airplane" "a space shuttle" "an suv" "a pickup truck" Figure 1: CLIP-Forge generates meaningful shapes without using any shape-text pairing labels.

artificial intelligence, machine learning, north america government, (17 more...)

arXiv.org Artificial Intelligence

Oct-6-2021

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.34)

Genre:
- Research Report (0.40)

Industry:
- Automobiles & Trucks > Manufacturer (0.34)
- Transportation > Air (0.34)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning > Neural Networks (0.91)
    - Natural Language > Large Language Model (0.63)
    - Vision (1.00)
  - Sensing and Signal Processing > Image Processing (0.91)