AITopics | gligen

Collaborating Authors

gligen

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Grounding Text-To-Image Diffusion Models For Controlled High-Quality Image Generation

Süleyman, Ahmad, Biricik, Göksel

arXiv.org Artificial IntelligenceJan-15-2025

Large-scale text-to-image (T2I) diffusion models have demonstrated an outstanding performance in synthesizing diverse high-quality visuals from natural language text captions. Multiple layout-to-image models have been developed to control the generation process by utilizing a broad array of layouts such as segmentation maps, edges, and human keypoints. In this work, we present ObjectDiffusion, a model that takes inspirations from the top cutting-edge image generative frameworks to seamlessly condition T2I models with new bounding boxes capabilities. Specifically, we make substantial modifications to the network architecture introduced in ContorlNet to integrate it with the condition processing and injection techniques proposed in GLIGEN. ObjectDiffusion is initialized with pretraining parameters to leverage the generation knowledge obtained from training on large-scale datasets. We fine-tune ObjectDiffusion on the COCO2017 training dataset and evaluate it on the COCO2017 validation dataset. Our model achieves an AP$_{50}$ of 46.6, an AR of 44.5, and a FID of 19.8 outperforming the current SOTA model trained on open-source datasets in all of the three metrics. ObjectDiffusion demonstrates a distinctive capability in synthesizing diverse, high-quality, high-fidelity images that seamlessly conform to the semantic and spatial control layout. Evaluated in qualitative and quantitative tests, ObjectDiffusion exhibits remarkable grounding abilities on closed-set and open-set settings across a wide variety of contexts. The qualitative assessment verifies the ability of ObjectDiffusion to generate multiple objects of different sizes and locations.

artificial intelligence, conditional entity, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2501.09194

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
Asia > China > Heilongjiang Province > Daqing (0.04)

Genre: Research Report (0.50)

Industry: Transportation > Ground > Road (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.68)

Add feedback

Directed Diffusion: Direct Control of Object Placement through Attention Guidance

Ma, Wan-Duo Kurt, Lewis, J. P., Lahiri, Avisek, Leung, Thomas, Kleijn, W. Bastiaan

arXiv.org Artificial IntelligenceSep-26-2023

Text-guided diffusion models such as DALLE-2, Imagen, eDiff-I, and Stable Diffusion are able to generate an effectively endless variety of images given only a short text prompt describing the desired image content. In many cases the images are of very high quality. However, these models often struggle to compose scenes containing several key objects such as characters in specified positional relationships. The missing capability to ``direct'' the placement of characters and objects both within and across images is crucial in storytelling, as recognized in the literature on film and animation theory. In this work, we take a particularly straightforward approach to providing the needed direction. Drawing on the observation that the cross-attention maps for prompt words reflect the spatial layout of objects denoted by those words, we introduce an optimization objective that produces ``activation'' at desired positions in these cross-attention maps. The resulting approach is a step toward generalizing the applicability of text-guided diffusion models beyond single images to collections of related images, as in storybooks. Directed Diffusion provides easy high-level positional control over multiple objects, while making use of an existing pre-trained model and maintaining a coherent blend between the positioned objects and the background. Moreover, it requires only a few lines to implement.

attention guidance, directed diffusion, object placement, (11 more...)

arXiv.org Artificial Intelligence

2302.13153

Country:

North America > United States > New York (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Europe > Austria (0.04)
Asia > China > Heilongjiang Province > Daqing (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment (0.93)
Media > Film (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.46)

Add feedback

GLIGEN gives you more control over AI image generation

#artificialintelligenceMar-14-2023, 06:25:38 GMT

In current models, the only way to describe where an object should be placed in an AI image is with text – with only moderate success. Researchers now present a model that uses bounding boxes. AI image generation has rapidly evolved from diffuse visualizations to very concrete, sometimes even photorealistic results. The more detailed the specification, the better the generation can be influenced. Although details of the image composition can be described with text, such as where an object should be placed, these details are often only moderately implemented.

ai image generation, gligen, information, (2 more...)

#artificialintelligence

Country: North America > United States > Wisconsin > Dane County > Madison (0.06)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.79)
Information Technology > Artificial Intelligence > Vision (0.77)
Information Technology > Sensing and Signal Processing > Image Processing (0.65)

Add feedback