Goto

Collaborating Authors

 ediffi


nvidias-ediffi-diffusion-model-allows-painting-with-words-and-more

#artificialintelligence

Attempting to make precise compositions with latent diffusion generative image models such as Stable Diffusion can be like herding cats; the very same imaginative and interpretive powers that enable the system to create extraordinary detail and to summon up extraordinary images from relatively simple text-prompts is also difficult to turn off when you're looking for Photoshop-level control over an image generation. Now, a new approach from NVIDIA research, titled ensemble diffusion for images (eDiffi), uses a mixture of multiple embedding and interpretive methods (rather than the same method all the way through the pipeline) to allow for a far greater level of control over the generated content. 'Painting with words' is one of the two novel capabilities in NVIDIA's eDiffi diffusion model. Each daubed color represents a word from the prompt (see them appear on the left during generation), and the area color applied will consist only of that element. See source (official) video for more examples and better resolution at https://www.youtube.com/watch?v k6cOx9YjHJc Effectively this is'painting with masks', and reverses the inpainting paradigm in Stable Diffusion, which is based on fixing broken or unsatisfactory images, or extending images that could as well have been the desired size in the first place.


Nvidia's eDiffi is an impressive alternative to DALL-E 2 or Stable Diffusion

#artificialintelligence

Nvidia's eDiffi is a generative AI model for text-to-image and beats alternatives like DALL-E 2 or Stable Diffusion, according to the company. Following OpenAI, Google, Midjourney, and StabilityAI, Nvidia is now showing a generative text-to-image model. All major generative text-to-image models today are diffusion models. Well-known examples are DALL-E 2, Midjourney, Imagen or Stable Diffusion. These models perform image synthesis via an iterative denoising process, the eponymous diffusion. In this way, images are gradually generated from random noise.


GAN are the days for NVIDIA

#artificialintelligence

NVIDIA's model works better than the rest when it comes to customised prompts, due to the expert denoising system which trains denoisers to maintain fidelity to the textual prompt even in the later stage of the generation process. But, this is not the first time NVIDIA stepped into the waters of text-to-image modelling. Before coming up with eDiffi, NVIDIA used deep learning models to create versions of the GauGAN model. The second version of the model, released in November 2021, was trained on 10 million high-quality landscape images. The application demo allowed users to produce images based on any text input they provide. The GauGAN model is based on generative adversarial networks (GAN), unlike eDiffi, which uses diffusion modelling for generating images. So why did NVIDIA take a departure from using GAN for their text-to-image feature?