Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models

Tewel, Yoad, Gal, Rinon, Samuel, Dvir, Atzmon, Yuval, Wolf, Lior, Chechik, Gal

Nov-12-2024–arXiv.org Artificial Intelligence

Figure 1: Given an input image (left in each pair), either real (top row) or generated (mid row), along with a simple textual prompt describing an object to be added Add-it seamlessly adds the object to the image in a natural way. Add-it allows the step-by-step creation of complex scenes without the need for optimization or pre-training. Adding Object into images based on text instructions is a challenging task in semantic image editing, requiring a balance between preserving the original scene and seamlessly integrating the new object in a fitting location. Despite extensive efforts, existing models often struggle with this balance, particularly with finding a natural location for adding an object in complex scenes. We introduce Add-it, a training-free approach that extends diffusion models' attention mechanisms to incorporate information from three key sources: the scene image, the text prompt, and the generated image itself. Our weighted extended-attention mechanism maintains structural consistency and fine details while ensuring natural object placement. Human evaluations show that Add-it is preferred in over 80% of cases, and it also demonstrates improvements in various automated metrics. Our code and data will be available at: https://research.nvidia.com/labs/par/addit/ Adding objects to images based on textual instructions is a challenging task in image editing, with numerous applications in computer graphics, content creation and synthetic data generation. A creator may want to use text-to-image models to iteratively build a complex visual scene, while autonomous driving researchers may wish to draw pedestrians in new scenarios for training their car-perception system. Despite considerable recent research efforts on text-based editing, this particular task remains a challenge.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

Nov-12-2024

arXiv.org PDF

Add feedback

Country:
- Europe > Germany (0.14)

Genre:
- Research Report (0.64)

Industry:
- Information Technology (0.54)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning (1.00)
    - Vision (1.00)
  - Sensing and Signal Processing > Image Processing (1.00)