Goto

Collaborating Authors

 Generative AI


Compositional Sculpting of Iterative Generative Processes

arXiv.org Artificial Intelligence

High training costs of generative models and the need to fine-tune them for specific tasks have created a strong interest in model reuse and composition. A key challenge in composing iterative generative processes, such as GFlowNets and diffusion models, is that to realize the desired target distribution, all steps of the generative process need to be coordinated, and satisfy delicate balance conditions. In this work, we propose Compositional Sculpting: a general approach for defining compositions of iterative generative processes. We then introduce a method for sampling from these compositions built on classifier guidance. We showcase ways to accomplish compositional sculpting in both GFlowNets and diffusion models. We highlight two binary operations $\unicode{x2014}$ the harmonic mean ($p_1 \otimes p_2$) and the contrast ($p_1 \unicode{x25D1}\,p_2$) between pairs, and the generalization of these operations to multiple component distributions. We offer empirical results on image and molecular generation tasks.


OpenAI Seeks New Valuation of Up To $90 Billion in Sale of Existing Shares

WSJ.com: WSJD - Technology

OpenAI is talking to investors about a possible share sale that would value the artificial-intelligence startup behind ChatGPT at between $80 billion to $90 billion, almost triple its level earlier this year, people familiar with the discussions said.


Creepy ChatGPT 'voice conversation' mimics a human with a convincing personality and knows almost everything

FOX News

OpenAI is rolling out the ability to carry on conversations with a human-sounding robot on the ChatGPT app. Alexa and Siri are about to get really jealous. The voice technology smart speakers are being taken on by a full-fledged humanoid AI robot being rolled out on the ChatGPT app for Plus paying customers. Starting this week, a new feature will be available on the iOS and Google Play ChatGPT apps that could potentially eliminate the need for keyboards. Let's dive in and see exactly what is going to be at our fingertips.


How to use Stable Diffusion to create AI art on your PC

PCWorld

It's hard to miss how much attention AI image generators alone have attracted in recent months. With good reason, because they demonstrate the progress of deep learning models in a vivid and playful way. From chaotic random images generated with neural networks, which Google made accessible to the general public with Deep Dream in 2015, the journey went to almost photo-realistic images of the generators Dall-E 2 by Open AI, Midjourney by Midjian, or DreamStudio by Stable Diffusion. Generators are now available not only in the cloud, but also for your own PC. Provided it has enough power.


So Much for 'Learn to Code'

The Atlantic - Technology

The quickest way to second-guess a decision to major in English is this: have an extended family full of Salvadoran immigrants and pragmatic midwesterners. The ability to recite Chaucer in the original Middle English was unlikely to land me a job that would pay off my student loans and help me save for retirement, they suggested when I was a college freshman still figuring out my future. I stuck with English, but when my B.A. eventually spat me out into the thick of the Great Recession, I worried that they'd been right. After all, computer-science degrees, and certainly not English, have long been sold to college students as among the safest paths toward 21st-century job security. Coding jobs are plentiful across industries, and the pay is good--even after the tech layoffs of the past year.


The Morning After: Amazon bets $4 billion on an OpenAI rival

Engadget

Amazon's bid for AI glory is in the billions. It's investing up to $4 billion in OpenAI rival Anthropic to provide advanced deep learning and other services for its Amazon Web Services (AWS) customers. Google has already invested $400 million in the company, which was founded by former OpenAI executives. Anthropic recently unveiled its first consumer-facing chatbot Claude 2, accessible by subscription much like OpenAI's ChatGPT. The Claude Constitutional AI system is guided by 10 "foundational" principles of fairness and autonomy and is supposed to be harder to trick than other AI. Anthropic is already working on a chatbot it calls Claude-Next, which is supposed to be 10 times more powerful than any current AI.


OpenAI introduces voice and image prompts to ChatGPT

Al Jazeera

OpenAI is bringing audio and image capabilities to ChatGPT. The platform, which has long been limited to written prompts, will be adding the new features over the next two weeks to paid versions of the app, OpenAI announced in a blog post on Monday. Everyone else will be receiving the features "soon after". Users can have voice conversations with the chatbot, bringing it closer to popular AI assistants such as Apple's Siri and Amazon's Alexa. ChatGPT's new voice feature can also narrate bedtime stories, settle debates at the dinner table and speak out loud text input from users.


TechScape: AI-made images mean seeing is no longer believing

The Guardian

A strange thing happened last week when you searched for "tank man" on Google. Tap on image results and instead of the usual photos of Tiananmen Square in Beijing, and the iconic image of a brave protester staring down a convoy of tanks that was captured in 1989, the first result was the same historic moment – but from a different point of view. For a time last week, the first result on Google Images for "tank man" was instead an AI-generated image of the same protester, taking a selfie in front of the tank. The image was created by Midjourney, and was at least six months old. First reported by 404 Media, a new tech journalism startup set up by former Vice News staff, the emergence of the tank man selfie – which Google subsequently removed from search results for "tank man" – highlighted one of the main fears that Eddie Perez, Twitter's former head of election integrity, highlighted to me in a recent podcast interview: it's now possible, with the use of AI imagery, to create alternative history. And that has huge ramifications not only on our lives, but also our elections.


ChatGPT can now answer out loud with five different synthesised voices when users talk to the AI chatbot

Daily Mail - Science & tech

Users can now talk out loud to the AI chatbot and it will answer back with its own synthesised voice. The feature is part of an upgrade to the mobile app and follows in the footsteps of voice assistants such as Amazon's Alexa and Apple's Siri. ChatGPT has been given five different voices – both male and female - that were trained on actors hired by OpenAI, the US company behind the technology. The firm claims they are far more realistic than rival voice assistants – and is looking at allowing users to create their own in the future. Spotify has announced it is trialling the technology to translate podcasts into other languages, with an AI-generated imitation of the original host's voice.


Directed Diffusion: Direct Control of Object Placement through Attention Guidance

arXiv.org Artificial Intelligence

Text-guided diffusion models such as DALLE-2, Imagen, eDiff-I, and Stable Diffusion are able to generate an effectively endless variety of images given only a short text prompt describing the desired image content. In many cases the images are of very high quality. However, these models often struggle to compose scenes containing several key objects such as characters in specified positional relationships. The missing capability to ``direct'' the placement of characters and objects both within and across images is crucial in storytelling, as recognized in the literature on film and animation theory. In this work, we take a particularly straightforward approach to providing the needed direction. Drawing on the observation that the cross-attention maps for prompt words reflect the spatial layout of objects denoted by those words, we introduce an optimization objective that produces ``activation'' at desired positions in these cross-attention maps. The resulting approach is a step toward generalizing the applicability of text-guided diffusion models beyond single images to collections of related images, as in storybooks. Directed Diffusion provides easy high-level positional control over multiple objects, while making use of an existing pre-trained model and maintaining a coherent blend between the positioned objects and the background. Moreover, it requires only a few lines to implement.