ColorEdit: Training-free Image-Guided Color editing with diffusion model

Yin, Xingxi, Li, Zhi, Zhang, Jingfeng, Li, Chenglin, Zhang, Yin

Nov-15-2024–arXiv.org Artificial Intelligence

Text-to-image (T2I) diffusion models, with their impressive generative capabilities, have been adopted for image editing tasks, demonstrating remarkable efficacy. However, due to attention leakage and collision between the cross-attention map of the object and the new color attribute from the text prompt, text-guided image editing methods may fail to change the color of an object, resulting in a misalignment between the resulting image and the text prompt. In this paper, we conduct an in-depth analysis on the process of text-guided image synthesizing and what semantic information different cross-attention blocks have learned. We observe that the visual representation of an object is determined in the up-block of the diffusion model in the early stage of the denoising process, and color adjustment can be achieved through value matrices alignment in the cross-attention layer. Based on our findings, we propose a straightforward, yet stable, and effective image-guided method to modify the color of an object without requiring any additional fine-tuning or training. Lastly, we present a benchmark dataset called COLORBENCH, the first benchmark to evaluate the performance of color change methods. Extensive experiments validate the effectiveness of our method in object-level color editing and surpass the performance of popular text-guided image editing approaches in both synthesized and real images.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

Nov-15-2024

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.14)
- Europe > Germany (0.14)

Genre:
- Research Report > New Finding (0.48)

Industry:
- Media > Photography (0.75)
- Transportation > Passenger (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)
  - Natural Language (1.00)
  - Representation & Reasoning (1.00)