Specify and Edit: Overcoming Ambiguity in Text-Based Image Editing

Iakovleva, Ekaterina, Pizzati, Fabio, Torr, Philip, Lathuilière, Stéphane

Jul-29-2024–arXiv.org Artificial Intelligence

Text-based editing diffusion models exhibit limited performance when the user's input instruction is ambiguous. To solve this problem, we propose $\textit{Specify ANd Edit}$ (SANE), a zero-shot inference pipeline for diffusion-based editing systems. We use a large language model (LLM) to decompose the input instruction into specific instructions, i.e. well-defined interventions to apply to the input image to satisfy the user's request. We benefit from the LLM-derived instructions along the original one, thanks to a novel denoising guidance strategy specifically designed for the task. Our experiments with three baselines and on two datasets demonstrate the benefits of SANE in all setups. Moreover, our pipeline improves the interpretability of editing models, and boosts the output diversity. We also demonstrate that our approach can be applied to any edit, whether ambiguous or not. Our code is public at https://github.com/fabvio/SANE.

ambiguous instruction, instruction, specific instruction, (15 more...)

arXiv.org Artificial Intelligence

Jul-29-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York (0.04)
- Europe > United Kingdom
  - England > Oxfordshire > Oxford (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report (1.00)

Industry:
- Media > Photography (0.42)
- Transportation > Ground
  - Road (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found