Sequential Attention GAN for Interactive Image Editing via Dialogue
Cheng, Yu, Gan, Zhe, Li, Yitong, Liu, Jingjing, Gao, Jianfeng
In this paper, we introduce a new task - interactive image editing via conversational language, where users can guide an agent to edit images via multi-turn dialogue in natural language. In each dialogue turn, the agent takes a source image and a natural language description from the user as the input, and generates a target image following the textual description. Two new datasets are created for this task,Zap-Seq and DeepFashion-Seq, collected via crowdsourcing. For this task, we propose a new Sequential Attention Genrative Adversarial Network (SeqAttnGAN) framework, which applies a neural state tracker to encode both source image and textual descriptions, and generates high quality images in each dialogue turn. To achieve better region specific text-to-image generation, we also introducean attention mechanism into the model. Experiments on the two datasets, including quantitative evaluation and user study, show that our model outperforms state-of-the-art ap-proaches in both image quality and text-to-image consistency.
Dec-19-2018
- Country:
- Oceania > Australia
- North America > United States
- Hawaii > Honolulu County > Honolulu (0.04)
- Europe > Germany
- Saarland > Saarbrücken (0.04)
- Genre:
- Research Report (1.00)
- Industry:
- Media > Photography (0.63)
- Technology: