CLIP-Forge: Towards Zero-Shot Text-to-Shape Generation
Sanghi, Aditya, Chu, Hang, Lambourne, Joseph G., Wang, Ye, Cheng, Chin-Yi, Fumero, Marco
–arXiv.org Artificial Intelligence
While recent progress has been made in text-to-image generation, text-to-shape generation remains a challenging problem due to the unavailability of paired text and shape data at a large scale. We present a simple yet effective method for zeroshot text-to-shape generation based on a two-stage training process, which only depends on an unlabelled shape dataset and a pre-trained image-text network such as CLIP. Our method not only demonstrates promising zero-shot generalization, but also avoids expensive inference time optimization and can generate multiple shapes for a given text. "a cuboid sofa" "a round sofa" "an airplane" "a space shuttle" "an suv" "a pickup truck" Figure 1: CLIP-Forge generates meaningful shapes without using any shape-text pairing labels.
arXiv.org Artificial Intelligence
Oct-6-2021
- Country:
- North America > United States (0.34)
- Genre:
- Research Report (0.40)
- Industry:
- Automobiles & Trucks > Manufacturer (0.34)
- Transportation > Air (0.34)
- Technology: