Generative AI
Understanding Diffusion Models: A Unified Perspective
Diffusion models have shown incredible capabilities as generative models; indeed, they power the current state-of-the-art models on text-conditioned image generation such as Imagen and DALL-E 2. In this work we review, demystify, and unify the understanding of diffusion models across both variational and score-based perspectives. We first derive Variational Diffusion Models (VDM) as a special case of a Markovian Hierarchical Variational Autoencoder, where three key assumptions enable tractable computation and scalable optimization of the ELBO. We then prove that optimizing a VDM boils down to learning a neural network to predict one of three potential objectives: the original source input from any arbitrary noisification of it, the original source noise from any arbitrarily noisified input, or the score function of a noisified input at any arbitrary noise level. We then dive deeper into what it means to learn the score function, and connect the variational perspective of a diffusion model explicitly with the Score-based Generative Modeling perspective through Tweedie's Formula. Lastly, we cover how to learn a conditional distribution using diffusion models via guidance.
We Need to Talk About How Good A.I. Is Getting
What's impressive about DALL-E 2 isn't just the art it generates. It's how it generates art. These aren't composites made out of existing internet images -- they're wholly new creations made through a complex A.I. process known as "diffusion," which starts with a random series of pixels and refines it repeatedly until it matches a given text description. And it's improving quickly -- DALL-E 2's images are four times as detailed as the images generated by the original DALL-E, which was introduced only last year. DALL-E 2 got a lot of attention when it was announced this year, and rightfully so.
Free Text-to-Image AI Tool 'Stable Diffusion' is now publicly available
Stability AI recently announced the launch of a new text-to-image generator, 'Stable Diffusion'. The image-generating tool competes with the likes of DALLE-2, Midjourney, Imagen and others. Unlike other text-to-image models, Stable Diffusion is open-source and has no content filter. The code and model card of Stable Diffusion is available on GitHub and HuggingFace. Recently, the company has also launched the beta version of the platform, called DreamStudio.
Here's how the best AI art generators compare
Berlin-based Fabian Stelzer (opens in new tab), who describes himself on Twitter as a'prompt intern' working on three AI-based projects, carried out the image comparison experiment using the text-to-image AI art generators Midjourney, DALL-E 2 and Stable Diffusion. He entered the same prompts on each tool and used a 1:1 aspect ratio for the resulting images. With prompts ranging from "low poly game asset, Cthulhu monster, 2000 video game, isometric view" to "1990s clip art of a laughing crazy fax machine, windows 3.1, MS-DOS, early computer clip art", the results that Stelzer shared in his Twitter thread allow us to compare how the tools handle different types of requests. Midjourney's creations often feel very dark – almost apocalyptic. After all, this is the tool that was used to create the "last selfie on Earth" images that were going around recently (see our roundup of the weirdest AI art). We think this AI art generator definitely needs counselling, but it also seems to often produce the most natural results when it comes to artistic styles, particularly with textural details. Any artefacts appear natural, whereas in DALL-E 2 artefacts often look like obviously digital glitches. DALL-E 2 has a tendency to throw in random invented words, but it seems to be the best tool for creating photorealistic images and for handling facial expressions. Meanwhile, Stable Diffusion seems to often produce the cleanest results – Stelzer notes that it can create incredible photos too but that you need to be careful not to "overload" the scene.
Ray, the machine learning tech behind OpenAI, levels up to Ray 2.0
Were you unable to attend Transform 2022? Check out all of the summit sessions in our on-demand library now! Over the last two years, one of the most common ways for organizations to scale and run increasingly large and complex artificial intelligence (AI) workloads has been with the open-source Ray framework, used by companies from OpenAI to Shopify and Instacart. Ray enables machine learning (ML) models to scale across hardware resources and can also be used to support MLops workflows across different ML tools. Ray 1.0 came out in September 2020 and has had a series of iterations over the last two years. Today, the next major milestone was released, with the general availability of Ray 2.0 at the Ray Summit in San Francisco.
DO YO KNOW? BLOG TITLE OPTIMIZER USES AI, AND HOW WELL DOES IT WORK?
The AI system [Max] utilizes is GPT-3, a language model that works with regular appearing to be human language that is equipped for being changed in various ways. The enhancer takes as information a blog entry title to streamline. OpenAI's pre-prepared GPT-3 motor is utilized to produce six substitute titles. For every one of those substitute titles, a calibrated rendition of GPT-3 is counseled to judge how "great" they depend on custom preparation information. The custom preparation information in sync 3 comes from mass accommodation information from Hacker News, got by means of Google's BigQuery administration.
🍱 The Text-to-Image Synthesis Revolution
Next week, we will start a new series about text-to-image synthesis models. In the last year, this deep learning discipline has seen an astonishing level of progress. You probably heard about OpenAI DALL-E 2, but plenty of other impressive text-to-image generation models have been created in the last few months. We have seen Google coming up with models like Imagen and Parti; Meta has done amazing work with Make-A-Scene; OpenAI created GLIDE and, of course, DALL-E 2. All these models push the boundaries of text-to-image synthesis in ways that challenge human imagination. However, the innovation is not only coming from the big AI labs but also from startups in the space.
AI-generated art illustrates another problem with computers
It all started with the headline over an entry in Charlie Warzel's Galaxy Brain newsletter in the Atlantic: "Where Does Alex Jones Go From Here?" This is an interesting question because Jones is an internet troll so extreme that he makes Donald Trump look like Spinoza. For many years, he has parlayed a radio talkshow and a website into a comfortable multimillion-dollar business peddling nonsense, conspiracy theories, falsehoods and weird merchandise to a huge tribe of adherents. And until 4 August he had got away with it. On that day, though, he lost an epic defamation case brought against him by parents of children who died in the 2012 Sandy Hook massacre – a tragedy that he had consistently ridiculed as a staged hoax; a Texas jury decided that he should pay nearly $50m in damages for publishing this sadistic nonsense.
GANs vs. VAEs: What is the best generative AI approach?
GANs were first introduced by Ian Goodfellow and fellow researchers at the University of Montreal in 2014. They have shown tremendous promise in generating many types of realistic data. Yann LeCun, chief AI scientist at Meta, has written that GANs and their variations were "the most interesting idea in the last ten years in machine learning." For starters, they have been used to generate realistic speech, mimicking people for better translations, including matching voices and lip movements. They have also translated imagery and differentiated between night and day, as well as delineating dance moves between bodies.