Goto

Collaborating Authors

 shapeword


ShapeWords: Guiding Text-to-Image Synthesis with 3D Shape-Aware Prompts

Petrov, Dmitry, Goyal, Pradyumn, Shivashok, Divyansh, Tao, Yuanming, Averkiou, Melinos, Kalogerakis, Evangelos

arXiv.org Artificial Intelligence

To address this, conditioning methods have been proposed, such as ControlNet [51] and IPadapter [48], that aim to capture the desired shape or form more explicitly through the use of edge or depth maps as input conditions. Despite these advancements, current text-and image-conditioned synthesis approaches still face a number of challenges. First, they often struggle to balance both textual and visual conditions, when text describes a particular context that should be combined with the target shape to guide an image synthesis (Figure 1, top row). Second, commonly used visual conditions such as edge or depth maps are limited to a single viewpoint, resulting in a loss of valuable 3D shape information when users seek image variations of an underlying shape from different poses. Third, even when these models accurately reflect the target shape in specific views, users may want to explore shape variations - yet current models often lack flexible controls for such exploration. To overcome these challenges, we propose ShapeWords, a method designed to generate images that faithfully adhere to both the text prompt and a target 3D shape geometry, 1 arXiv:2412.02912v1