Sequential Attention GAN for Interactive Image Editing via Dialogue

Cheng, Yu, Gan, Zhe, Li, Yitong, Liu, Jingjing, Gao, Jianfeng

Dec-19-2018–arXiv.org Machine Learning

In this paper, we introduce a new task - interactive image editing via conversational language, where users can guide an agent to edit images via multi-turn dialogue in natural language. In each dialogue turn, the agent takes a source image and a natural language description from the user as the input, and generates a target image following the textual description. Two new datasets are created for this task,Zap-Seq and DeepFashion-Seq, collected via crowdsourcing. For this task, we propose a new Sequential Attention Genrative Adversarial Network (SeqAttnGAN) framework, which applies a neural state tracker to encode both source image and textual descriptions, and generates high quality images in each dialogue turn. To achieve better region specific text-to-image generation, we also introducean attention mechanism into the model. Experiments on the two datasets, including quantitative evaluation and user study, show that our model outperforms state-of-the-art ap-proaches in both image quality and text-to-image consistency.

dataset, deepfashion-seq, dialogue, (14 more...)

arXiv.org Machine Learning

Dec-19-2018

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - Victoria > Melbourne (0.04)
- North America > United States
  - Hawaii > Honolulu County > Honolulu (0.04)
- Europe > Germany
  - Saarland > Saarbrücken (0.04)

Genre:
- Research Report (1.00)

Industry:
- Media > Photography (0.63)

Technology:
- Information Technology
  - Sensing and Signal Processing > Image Processing (1.00)
  - Artificial Intelligence
    - Vision (1.00)
    - Natural Language (1.00)
    - Machine Learning > Neural Networks
      - Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found