Goto

Collaborating Authors

 Generative AI


How Does DALLยทE-2 Work?

#artificialintelligence

DALLยทE-2 is a new AI system that can create realistic images and art from a description in natural language. Recently OpenAI just releases the beta version of DALLยทE-2. In this article, we will take a close look at the original research paper of DALLยทE-2 and understand how exactly it works. DALLยทE-2 originates from this paper: Hierarchical Text-Conditional Image Generation with CLIP Latents [1]. DALLยทE-2 is based on the unCLIP model proposed in this paper.


I Asked an AI to Dream the Solar System as Food

#artificialintelligence

As soon as I saw these new artificial intelligence image creation tools, like DALL-E, I wanted to see how well they'd work for generating space and astronomy images. I'm still on the waiting list for DALL-E 2, so I don't have any feedback to give there, but I signed up for Midjourney AI, played around with the free account, and then signed up for a full paid account, so I could test out its capabilities. How well does it work? I'm still learning to craft prompts to get the best results, but the biggest issue is that they're unscientific. If I need a picture of the Space Launch System, it needs to be the actual Space Launch System and not some kind of art deco version of a rocket that looks like it was designed in the 1950s.


Gartner research: 2 types of emerging AI near hype cycle peak

#artificialintelligence

Were you unable to attend Transform 2022? Check out all of the summit sessions in our on-demand library now! According to new Gartner research, two types of emerging artificial intelligence (AI) -- emotion and generative AI -- are both reaching the peak of the digital advertising hype cycle. This is thanks to AI's expansion into targeting, measurement, identity resolution and even generating creative content. "I think one of the key pieces is that the options for marketers have been accelerating," Mike Froggatt, senior director analyst in the Gartner marketing practice, told VentureBeat.


Now Microsoft wants a share of the 'AI image generator' pie

#artificialintelligence

Text-to-image generative models like OpenAI's DALL-E 2 are attracting significant attention because of their ability to produce images merely based on text prompts. While DALL-E 2 is the most popular, there are other budding AI image generators such as Ultraleap's'Midjourney', Hugging Face's'Craiyon', Meta's'Make-A-Scene' and Google's'Imagen'. Now, it seems that Microsoft also wants a share of the'AI image generator' pie. Recently, Microsoft's Asia research team introduced NUWA-Infinity, which is a multimodal generative model designed to generate high-quality images and videos from any given text, image or video input. In its research paper titled, 'NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis', Microsoft said that they evaluated NUWA-Infinity on five high-resolution visual synthesis tasks-- Compared to its predecessor'NUWA', which also covers images and videos, NUWA-Infinity has superior visual synthesis capabilities in terms of resolution and variable-size generation.


Is DALL-E 2 Just 'Gluing Things Together' Without Understanding Their Relationships?

#artificialintelligence

A new research paper from Harvard University suggests that OpenAIโ€™s headline-grabbing text-to-image framework DALL-E 2 has notable difficulty in reproducing even infant-level relations between the elements that it composes into synthesized photos, despite the dazzling sophistication of much of its output. The researchers undertook a user study involving 169 crowdsourced participants, who were presented with [โ€ฆ]


Reflecting On "Artificial General Intelligence" And AI Sentience

#artificialintelligence

Intelligence comes in many forms. Octopuses are highly intelligent--and completely unlike humans. In case you haven't noticed, artificial intelligence systems have been behaving in increasingly astonishing ways lately. OpenAI's new model DALL-E 2, for instance, can produce captivating original images based on simple text prompts. Models like DALL-E are making it harder to dismiss the notion that AI is capable of creativity. Consider, for instance, DALL-E's imaginative rendition of "a hip-hop cow in a denim jacket recording a hit single in the studio."


AI Hits Again with Doomsday Selfies! And this Time as Requested by a Tiktoker

#artificialintelligence

OpenAI, an Artificial Intelligence (AI) expert team, has been showing off its ground-breaking AI photo generator DALL-E for more than a year. The first generation model was capable of creating realistic and artistic images from a text description. Some of the best include an avocado-shaped armchair, Darth Vader fishing in the Arctic, and many more. Now, a group of enquiring coders asked DALL-E, 'What will be the last selfies on Earth?' The outcomes are truly terrifying. The AI generator's sample photos depict apocalyptic ghost towns, a person battered with battle scars, and thick smoke in the background, most likely caused by a nuclear bomb explosion.


AI's Cool New Trick

#artificialintelligence

One of my lingering misgivings from my days as a reporter at the Wall Street Journal concerns the role I played in unleashing PowerPoint on the world. I was just doing my job. Like all good journalists, I was at the bar, at a conference in the spring of 1987, when a consultant I knew introduced me to one of the two founders of a startup that was about to unveil PowerPoint. The demo he showed me was intriguing, and the prospects for the product checked out with a number of smart folks I interviewed at the conference, so I wrote about it. Next thing you know, Microsoft buys the startup, and we're all awash in bullet points and so many fonts and type sizes that presentations may look like ransom notes. Today, I'd like to introduce you to the latest advancement in visual presentations, an artificial intelligence that lets you generate an image based on a single sentence -- for instance, the AI produced the image above based on the prompt, "astronaut riding a horse in a photorealistic style."


Adversarial Attacks on Image Generation With Made-Up Words

arXiv.org Artificial Intelligence

Text-guided image generation models have made impressive strides in recent years. State-of-the-art models, like DALL-E 2 [1], Imagen [2], and Parti [3], can generate coherent images matching a remarkably wide variety of prompts in virtually any visual domain and style. While the ability to generate high-quality images of any subject is an exciting development for content creation, it also raises ethical questions about potential misuse of this technology. In particular, text-guided image generation models may be used to produce fake imagery of existing individuals for misinformation (so-called "deepfakes" [4]), or produce visual content deemed offensive or harmful. These concerns have been used to justify the decision to limit access to large text-guided image generation models, as well as moderate their use according to content policies implemented in prompt filters.


Finally, an answer to the question: AI -- what is it good for?

#artificialintelligence

That headline might seem a bit churlish, given the tremendous amount of energy, investment, and hype in the AI space, as well as undeniable evidence of technological progress. After all, AI today can beat any human in games ranging from chess to Starcraft (DeepMind's AlphaZero and AlphaStar); it can write a B- college history essay in seconds with a few prompts (OpenAI's GPT-3); it can draw on-demand illustrations of surprising creativity and quality (OpenAI's DALL-E 2). For AI proponents like Sam Altman, OpenAI's CEO, these advances herald an era where "AI creative tools are going to be the biggest impact on creative work flows since the computer itself," as he tweeted last month. That may turn out to be true. But in the here and now, I'm still left somewhat underwhelmed.