AITopics | conditional image synthesis

Collaborating Authors

conditional image synthesis

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

UFC-BERT: Unifying Multi-Modal Controls for Conditional Image Synthesis

Neural Information Processing SystemsDec-25-2025, 02:50:35 GMT

Conditional image synthesis aims to create an image according to some multi-modal guidance in the forms of textual descriptions, reference images, and image blocks to preserve, as well as their combinations. In this paper, instead of investigating these control signals separately, we propose a new two-stage architecture, UFC-BERT, to unify any number of multi-modal controls. In UFC-BERT, both the diverse control signals and the synthesized image are uniformly represented as a sequence of discrete tokens to be processed by Transformer. Different from existing two-stage autoregressive approaches such as DALL-E and VQGAN, UFC-BERT adopts non-autoregressive generation (NAR) at the second stage to enhance the holistic consistency of the synthesized image, to support preserving specified image blocks, and to improve the synthesis speed. Further, we design a progressive algorithm that iteratively improves the non-autoregressively generated image, with the help of two estimators developed for evaluating the compliance with the controls and evaluating the fidelity of the synthesized image, respectively. Extensive experiments on a newly collected large-scale clothing dataset M2C-Fashion and a facial dataset Multi-Modal CelebA-HQ verify that UFC-BERT can synthesize high-fidelity images that comply with flexible multi-modal controls.

conditional image synthesis, ufc-bert, unifying multi-modal control, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

e46bc064f8e92ac2c404b9871b2a4ef2-Supplemental.pdf

Neural Information Processing SystemsAug-18-2025, 05:52:49 GMT

artificial intelligence, synthesis, textual control, (15 more...)

Neural Information Processing Systems

Industry: Information Technology (0.30)

Technology: Information Technology > Artificial Intelligence > Vision > Face Recognition (0.48)

Add feedback

UFC-BERT: Unifying Multi-Modal Controls for Conditional Image Synthesis

Neural Information Processing SystemsJan-19-2025, 10:51:20 GMT

conditional image synthesis, ufc-bert, unifying multi-modal control, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Image Synthesis From Reconfigurable Layout and Style

Sun, Wei, Wu, Tianfu

arXiv.org Machine LearningAug-20-2019

Despite remarkable recent progress on both unconditional and conditional image synthesis, it remains a long-standing problem to learn generative models that are capable of synthesizing realistic and sharp images from reconfigurable spatial layout (i.e., bounding boxes + class labels in an image lattice) and style (i.e., structural and appearance variations encoded by latent vectors), especially at high resolution. By reconfigurable, it means that a model can preserve the intrinsic one-to-many mapping from a given layout to multiple plausible images with different styles, and is adaptive with respect to perturbations of a layout and style latent code. In this paper, we present a layout- and style-based architecture for generative adversarial networks (termed LostGANs) that can be trained end-to-end to generate images from reconfigurable layout and style. Inspired by the vanilla StyleGAN, the proposed LostGAN consists of two new components: (i) learning fine-grained mask maps in a weakly-supervised manner to bridge the gap between layouts and images, and (ii) learning object instance-specific layout-aware feature normalization (ISLA-Norm) in the generator to realize multi-object style generation. In experiments, the proposed method is tested on the COCO-Stuff dataset and the Visual Genome dataset with state-of-the-art performance obtained. The code and pretrained models are available at \url{https://github.com/iVMCL/LostGANs}.

image synthesis, layout, synthesis, (13 more...)

arXiv.org Machine Learning

1908.075

Country:

North America > United States > North Carolina (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report (1.00)

Industry: Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Conditional Image Synthesis With Auxiliary Classifier GANs

Odena, Augustus, Olah, Christopher, Shlens, Jonathon

arXiv.org Machine LearningJul-20-2017

Synthesizing high resolution photorealistic images has been a long-standing challenge in machine learning. In this paper we introduce new methods for the improved training of generative adversarial networks (GANs) for image synthesis. We construct a variant of GANs employing label conditioning that results in 128x128 resolution image samples exhibiting global coherence. We expand on previous work for image quality assessment to provide two new analyses for assessing the discriminability and diversity of samples from class-conditional image synthesis models. These analyses demonstrate that high resolution samples provide class information not present in low resolution samples. Across 1000 ImageNet classes, 128x128 samples are more than twice as discriminable as artificially resized 32x32 samples. In addition, 84.7% of the classes have samples exhibiting diversity comparable to real ImageNet data.

artificial intelligence, machine learning, training data, (16 more...)

arXiv.org Machine Learning

1610.09585

Country: Oceania > Australia > New South Wales > Sydney (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

[Research] [1610.09585] Conditional Image Synthesis With Auxiliary Classifier GANs • /r/MachineLearning

@machinelearnbotNov-1-2016, 06:35:41 GMT

From the team that brought you the Deconv Checkerboard Artifacts article. This is an idea which I (and I wouldn't be surprised if many others) have thought of before, but never thought it would actually improve results--I'm glad that these guys pursued it and used it to good effect. I'm impressed, but not sure that I'm entirely sold: browsing through the full set of samples, it seems that it doesn't work any better than any other Imagenet GAN for a pretty large majority of classes. This looks more representative of most of the classes, but the fact that they're getting global coherence on some of the classes (it seems to like hotdogs/vegetables and flowers) suggests that this is a worthwhile track to pursue. I'll be throwing a celebA experiment in the blender after the ICLR deadline, keen to see how it turns out with facial attribute labels.

artificial intelligence, conditional image synthesis, machinelearning

@machinelearnbot

Technology: Information Technology > Artificial Intelligence > Vision (0.40)

Add feedback