GLoD: Composing Global Contexts and Local Details in Image Generation

Yamada, Moyuru

arXiv.org Artificial Intelligence 

MultiDiffusion [Bar-Tal et al., 2023] places an object with specified details on a certain region using segmentation Diffusion models have demonstrated their capability masks and a prompt for each segment. These methods to synthesize high-quality and diverse images work without requiring any additional training; however, they from textual prompts. However, simultaneous control struggle to control both the global contexts (e.g., object interactions) over both global contexts (e.g., object layouts and the local details (e.g., object colors and emotions) and interactions) and local details (e.g., colors and simultaneously. With a complex prompt containing emotions) still remains a significant challenge. The multiple objects, the models often misinterpret specified local models often fail to understand complex descriptions details, directing them to the wrong target or ignoring them, involving multiple objects and reflect specified similar to the issues observed in Stable Diffusion [Rombach visual attributes to wrong targets or ignore et al., 2022]. While splitting the complex prompt into multiple them. This paper presents Global-Local Diffusion prompts allows the model to depict each object more (GLoD), a novel framework which allows simultaneous accurately, handling the prompts independently poses limitations control over the global contexts and the local in addressing a global context that describes interactions details in text-to-image generation without requiring and relationships between the multiple objects.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found