Generative AI
These virtual robot arms get smarter by training each other
A virtual robot arm has learned to solve a wide range of different puzzles--stacking blocks, setting the table, arranging chess pieces--without having to be retrained for each task. It did this by playing against a second robot arm that was trained to give it harder and harder challenges. Self play: Developed by researchers at OpenAI, the identical robot arms--Alice and Bob--learn by playing a game against each other in a simulation, without human input. The robots use reinforcement learning, a technique in which AIs are trained by trial and error what actions to take in different situations to achieve certain goals. The game involves moving objects around on a virtual tabletop.
A radish in a tutu walking a dog? This AI can draw it really well
An artist can draw a baby daikon radish wearing a tutu and walking a dog, even if they've never seen one before. But this kind of visual mashup has long been a trickier task for computers. Now, a new artificial-intelligence model can create such images with clarity -- and cuteness. This week nonprofit research company OpenAI released DALL-E, which can generate a slew of impressive-looking, often surrealistic images from written prompts such as "an armchair in the shape of an avocado" or "a painting of a capybara sitting in a field at sunrise." (And yes, the name DALL-E is a portmanteau referencing surrealist artist Salvador Dalรญ and animated sci-fi film "WALL-E.") A new AI model from OpenAI, DALL-E, can create pictures from the text prompt "an illustration of a baby daikon radish in a tutu walking a dog".
OpenAI's DALL-E app generates images from just a description
OpenAI, the company co-founded by Elon Musk and backed by Microsoft, has already mastered Dota 2 and the art of writing fake news. Now, it has reached another milestone with DALL-E (a portmanteau of "Wall-E" and "Dali"), an AI app that can create an image out of nearly any description. For example, if you ask for "a cat made of sushi" or a "high quality illustration of a giraffe turtle chimera," it will deliver those things, often with startlingly good quality (and sometimes not). DALL-E can create images based on a description of its attributes, like "a pentagonal green clock," or "a collection of glasses is sitting on a table." In the latter example, it places both drinking and eye glasses on a table with varying degrees of success.
AI illustrator draws imaginative pictures to go with text captions
A neural network uses text captions to create outlandish images โ such as armchairs in the shape of avocados โ demonstrating it understands how language shapes visual culture. OpenAI, an artificial intelligence company that recently partnered with Microsoft, developed the neural network, which it calls DALL-E. It is a version of the company's GPT-3 language model that can create expansive written works based on short text prompts, but DALL-E produces images instead. "The world isn't just text," says Ilya Sutskever, co-founder of OpenAI. "Humans don't just talk: we also see. A lot of important context comes from looking."
This avocado armchair could be the future of AI
For all GPT-3's flair, its output can feel untethered from reality, as if it doesn't know what it's talking about. By grounding text in images, researchers at OpenAI and elsewhere are trying to give language models a better grasp of the everyday concepts that humans use to make sense of things. DALLยทE and CLIP come at this problem from different directions. At first glance, CLIP (Contrastive Language-Image Pre-training) is yet another image recognition system. Except that it has learned to recognize images not from labeled examples in curated data sets, as most existing models do, but from images and their captions taken from the internet.
StarNet: Gradient-free Training of Deep Generative Models using Determined System of Linear Equations
Zadeh, Amir, Benoit, Santiago, Morency, Louis-Philippe
In this paper we present an approach for training deep generative models solely based on solving determined systems of linear equations. A network that uses this approach, called a StarNet, has the following desirable properties: 1) training requires no gradient as solution to the system of linear equations is not stochastic, 2) is highly scalable when solving the system of linear equations w.r.t the latent codes, and similarly for the parameters of the model, and 3) it gives desirable least-square bounds for the estimation of latent codes and network parameters within each layer.
The Future is Here! Have You Checked OpenAI's GPT-3 Yet?
Ever wonder how close has AI gotten to impersonating human beings? The latest GPT-3 can code computer programs, compose tweets, summarize emails, write news, answer questions, translate languages, and write fiction and poetry too. It can take up almost any virtual English language task. In the latest recent milestone on YouTube, it created an app that functions similar to Instagram. Dubbed as one of the most important advancements in AI in recent years, GPT-3 or Generative Pre-Trained Transformer 3 has raised the AI goal posts many notches toward the stratosphere.
Building LEGO Using Deep Generative Models of Graphs
Thompson, Rylee, Ghalebi, Elahe, DeVries, Terrance, Taylor, Graham W.
Generative models are now used to create a variety of high-quality digital artifacts. Yet their use in designing physical objects has received far less attention. In this paper, we advocate for the construction toy, LEGO, as a platform for developing generative models of sequential assembly. We develop a generative model based on graph-structured neural networks that can learn from human-built structures and produce visually compelling designs. Our code is released at: https://github.
Unsupervised Learning of Global Factors in Deep Generative Models
Peis, Ignacio, Olmos, Pablo M., Artรฉs-Rodrรญguez, Antonio
We present a novel deep generative model based on non i.i.d. variational autoencoders that captures global dependencies among observations in a fully unsupervised fashion. In contrast to the recent semi-supervised alternatives for global modeling in deep generative models, our approach combines a mixture model in the local or data-dependent space and a global Gaussian latent variable, which lead us to obtain three particular insights. First, the induced latent global space captures interpretable disentangled representations with no user-defined regularization in the evidence lower bound (as in $\beta$-VAE and its generalizations). Second, we show that the model performs domain alignment to find correlations and interpolate between different databases. Finally, we study the ability of the global space to discriminate between groups of observations with non-trivial underlying structures, such as face images with shared attributes or defined sequences of digits images.
StrokeGAN: Reducing Mode Collapse in Chinese Font Generation via Stroke Encoding
Zeng, Jinshan, Chen, Qi, Liu, Yunxin, Wang, Mingwen, Yao, Yuan
The generation of stylish Chinese fonts is an important problem involved in many applications. Most of existing generation methods are based on the deep generative models, particularly, the generative adversarial networks (GAN) based models. However, these deep generative models may suffer from the mode collapse issue, which significantly degrades the diversity and quality of generated results. In this paper, we introduce a one-bit stroke encoding to capture the key mode information of Chinese characters and then incorporate it into CycleGAN, a popular deep generative model for Chinese font generation. As a result we propose an efficient method called StrokeGAN, mainly motivated by the observation that the stroke encoding contains amount of mode information of Chinese characters. In order to reconstruct the one-bit stroke encoding of the associated generated characters, we introduce a stroke-encoding reconstruction loss imposed on the discriminator. Equipped with such one-bit stroke encoding and stroke-encoding reconstruction loss, the mode collapse issue of CycleGAN can be significantly alleviated, with an improved preservation of strokes and diversity of generated characters. The effectiveness of StrokeGAN is demonstrated by a series of generation tasks over nine datasets with different fonts. The numerical results demonstrate that StrokeGAN generally outperforms the state-of-the-art methods in terms of content and recognition accuracies, as well as certain stroke error, and also generates more realistic characters.