Goto

Collaborating Authors

 textual concept










Table 1 Evaluation of the state of the art model

Neural Information Processing Systems

Table 2: The accuracy on the VQA v2.0 test set. We thank all the reviewers for the helpful comments. Q1: How the paper's contribution relates to the current SOT A? SGAE is a rather complicated scene-graph based method specific to image captioning. The results with current SOT A + MIA will be stated more clearly in the paper. Q2: How to use MIA on the baseline systems (i.e., how is MIA applied to image captioning For the settings, we have listed them in the supplementary materials.


Imagine for Me: Creative Conceptual Blending of Real Images and Text via Blended Attention

Cho, Wonwoong, Zhang, Yanxia, Chen, Yan-Ying, Inouye, David I.

arXiv.org Artificial Intelligence

Blending visual and textual concepts into a new visual concept is a unique and powerful trait of human beings that can fuel creativity. However, in practice, cross-modal conceptual blending for humans is prone to cognitive biases, like design fixation, which leads to local minima in the design space. In this paper, we propose a T2I diffusion adapter "IT-Blender" that can automate the blending process to enhance human creativity. Prior works related to cross-modal conceptual blending are limited in encoding a real image without loss of details or in disentangling the image and text inputs. To address these gaps, IT-Blender leverages pretrained diffusion models (SD and FLUX) to blend the latent representations of a clean reference image with those of the noisy generated image. Combined with our novel blended attention, IT-Blender encodes the real reference image without loss of details and blends the visual concept with the object specified by the text in a disentangled way. Our experiment results show that IT-Blender outperforms the baselines by a large margin in blending visual and textual concepts, shedding light on the new application of image generative models to augment human creativity.