An artist can draw a baby daikon radish wearing a tutu and walking a dog, even if they've never seen one before. But this kind of visual mashup has long been a trickier task for computers. Now, a new artificial-intelligence model can create such images with clarity -- and cuteness. This week nonprofit research company OpenAI released DALL-E, which can generate a slew of impressive-looking, often surrealistic images from written prompts such as "an armchair in the shape of an avocado" or "a painting of a capybara sitting in a field at sunrise." (And yes, the name DALL-E is a portmanteau referencing surrealist artist Salvador Dalí and animated sci-fi film "WALL-E.") A new AI model from OpenAI, DALL-E, can create pictures from the text prompt "an illustration of a baby daikon radish in a tutu walking a dog".
For all GPT-3's flair, its output can feel untethered from reality, as if it doesn't know what it's talking about. By grounding text in images, researchers at OpenAI and elsewhere are trying to give language models a better grasp of the everyday concepts that humans use to make sense of things. DALL·E and CLIP come at this problem from different directions. At first glance, CLIP (Contrastive Language-Image Pre-training) is yet another image recognition system. Except that it has learned to recognize images not from labeled examples in curated data sets, as most existing models do, but from images and their captions taken from the internet.
DALL·E is a 12-billion parameter version of GPT-3 trained to generate images from text descriptions, using a dataset of text–image pairs. We've found that it has a diverse set of capabilities, including creating anthropomorphized versions of animals and objects, combining unrelated concepts in plausible ways, rendering text, and applying transformations to existing images. GPT-3 showed that language can be used to instruct a large neural network to perform a variety of text generation tasks. Image GPT showed that the same type of neural network can also be used to generate images with high fidelity. We extend these findings to show that manipulating visual concepts through language is now within reach.
You've probably never wondered what a knight made of spaghetti would look like, but here's the answer anyway--courtesy of a clever new artificial intelligence program from OpenAI, a company in San Francisco. The program, DALL-E, released earlier this month, can concoct images of all sorts of weird things that don't exist, like avocado armchairs, robot giraffes, or radishes wearing tutus. OpenAI generated several images, including the spaghetti knight, at WIRED's request. DALL-E is a version of GPT-3, an AI model trained on text scraped from the web that's capable of producing surprisingly coherent text. DALL-E was fed images and accompanying descriptions; in response, it can generate a decent mashup image.
A neural network uses text captions to create outlandish images – such as armchairs in the shape of avocados – demonstrating it understands how language shapes visual culture. OpenAI, an artificial intelligence company that recently partnered with Microsoft, developed the neural network, which it calls DALL-E. It is a version of the company's GPT-3 language model that can create expansive written works based on short text prompts, but DALL-E produces images instead. "The world isn't just text," says Ilya Sutskever, co-founder of OpenAI. "Humans don't just talk: we also see. A lot of important context comes from looking."