Imagine for Me: Creative Conceptual Blending of Real Images and Text via Blended Attention
Cho, Wonwoong, Zhang, Yanxia, Chen, Yan-Ying, Inouye, David I.
–arXiv.org Artificial Intelligence
Blending visual and textual concepts into a new visual concept is a unique and powerful trait of human beings that can fuel creativity. However, in practice, cross-modal conceptual blending for humans is prone to cognitive biases, like design fixation, which leads to local minima in the design space. In this paper, we propose a T2I diffusion adapter "IT-Blender" that can automate the blending process to enhance human creativity. Prior works related to cross-modal conceptual blending are limited in encoding a real image without loss of details or in disentangling the image and text inputs. To address these gaps, IT-Blender leverages pretrained diffusion models (SD and FLUX) to blend the latent representations of a clean reference image with those of the noisy generated image. Combined with our novel blended attention, IT-Blender encodes the real reference image without loss of details and blends the visual concept with the object specified by the text in a disentangled way. Our experiment results show that IT-Blender outperforms the baselines by a large margin in blending visual and textual concepts, shedding light on the new application of image generative models to augment human creativity.
arXiv.org Artificial Intelligence
Jul-15-2025
- Country:
- Europe
- Germany > Bavaria
- Upper Bavaria > Munich (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Germany > Bavaria
- Europe
- Genre:
- Research Report > New Finding (0.48)
- Technology: