Generative AI
OpenAI built gaming bots that can work as a team with inhuman precision
When humans and artificial intelligence face off in a game, like chess or Go, it's typically a one-against-one affair. Each player, human or AI, just has to outsmart a single opponent on a board that only changes when the players make a move. OpenAI is announcing today (June 25) that its newest AI bots can hold their own as a team of five against human gamers at Dota 2, a multiplayer game popular in e-sports for its complexity and necessity for teamwork. The AI research lab is looking to take the bots to Dota 2 championship matches in August to compete against the pros. Dota 2 is a challenging game for AI to master simply because of the amount of decisions that the players have to juggle. While chess can end in fewer than 40 moves, and Go fewer than 150, OpenAI's Dota 2 bots make 20,000 moves over the course of a 45 minute game.
Deep Generative Models with Learnable Knowledge Constraints
Hu, Zhiting, Yang, Zichao, Salakhutdinov, Ruslan, Liang, Xiaodan, Qin, Lianhui, Dong, Haoye, Xing, Eric
The broad set of deep generative models (DGMs) has achieved remarkable advances. However, it is often difficult to incorporate rich structured domain knowledge with the end-to-end DGMs. Posterior regularization (PR) offers a principled framework to impose structured constraints on probabilistic models, but has limited applicability to the diverse DGMs that can lack a Bayesian formulation or even explicit density evaluation. PR also requires constraints to be fully specified {\it a priori}, which is impractical or suboptimal for complex knowledge with learnable uncertain parts. In this paper, we establish mathematical correspondence between PR and reinforcement learning (RL), and, based on the connection, expand PR to learn constraints as the extrinsic reward in RL. The resulting algorithm is model-agnostic to apply to any DGMs, and is flexible to adapt arbitrary constraints with the model jointly. Experiments on human image generation and templated sentence generation show models with learned knowledge constraints by our algorithm greatly improve over base generative models.
Variational Bi-domain Triplet Autoencoder
Kuznetsova, Rita, Bakhteev, Oleg
We investigate deep generative models, which allow us to use training data from one domain to build a model for another domain. We consider domains to have similar structure (texts, images). We propose the Variational Bi-domain Triplet Autoencoder (VBTA) that learns a joint distribution of objects from different domains. There are many cases when obtaining any supervision (e.g. paired data) is difficult or ambiguous. For such cases we can seek a method that is able to the information about data relation and structure from the latent space. We extend the VBTAs objective function by the relative constraints or triplets that sampled from the shared latent space across domains. In other words, we combine the deep generative model with a metric learning ideas in order to improve the final objective with the triplets information. We demonstrate the performance of the VBTA model on different tasks: bi-directional image generation, image-to-image translation, even on unpaired data. We also provide the qualitative analysis. We show that VBTA model is comparable and outperforms some of the existing generative models.
Reinforcement Q-Learning from Scratch in Python with OpenAI Gym โ LearnDataSci
Essentially, Q-learning lets the agent use the environment's rewards to learn, over time, the best action to take in a given state. In our Taxi environment, we have the reward table, P, that the agent will learn from. It does thing by looking receiving a reward for taking an action in the current state, then updating a Q-value to remember if that action was beneficial. The values store in the Q-table are called a Q-values, and they map to a (state, action) combination. A Q-value for a particular state-action combination is representative of the "quality" of an action taken from that state.
Improving Language Understanding with Unsupervised Learning
Our system works in two stages; first we train a transformer model on a very large amount of data in an unsupervised manner -- using language modeling as a training signal -- then we fine-tune this model on much smaller supervised datasets to help it solve specific tasks. We developed this approach following our sentiment neuron work, in which we noted that unsupervised learning techniques can yield surprisingly discriminative features when trained on enough data. Here, we wanted to further explore this idea: can we develop one model, train it in an unsupervised way on a large amount of data, and then fine-tune the model to achieve good performance on many different tasks? Our results indicate that this approach works surprisingly well; the same core model can be fine-tuned for very different tasks with minimal adaptation. This work builds on the approach introduced in Semi-supervised Sequence Learning, which showed how to improve document classification performance by using unsupervised pre-training of an LSTM followed by supervised fine-tuning.
OpenAI Recruiting Fellows
The third way to get involved with OpenAI is as an OpenAI Scholar. Under this program OpenAI is providing 6-10 stipends and mentorship to individuals from underrepresented groups to study deep learning full-time for 3 months and open-source a project. This is a remote program and is open to anyone with US work authorization located in US timezones. In return, scholars are asked to document their experiences of studying deep learning and hopefully inspire others to do the same.
Stochastic seismic waveform inversion using generative adversarial networks as a geological prior
Mosser, Lukas, Dubrule, Olivier, Blunt, Martin J.
We present an application of deep generative models in the context of partial-differential equation (PDE) constrained inverse problems. We combine a generative adversarial network (GAN) representing an a priori model that creates subsurface geological structures and their petrophysical properties, with the numerical solution of the PDE governing the propagation of acoustic waves within the earth's interior. We perform Bayesian inversion using an approximate Metropolis-adjusted Langevin algorithm (MALA) to sample from the posterior given seismic observations. Gradients with respect to the model parameters governing the forward problem are obtained by solving the adjoint of the acoustic wave equation. Gradients of the mismatch with respect to the latent variables are obtained by leveraging the differentiable nature of the deep neural network used to represent the generative model. We show that approximate MALA sampling allows efficient Bayesian inversion of model parameters obtained from a prior represented by a deep generative model, obtaining a diverse set of realizations that reflect the observed seismic response.
r/MachineLearning - [D] How do we extract features from an LSTM Language Model
I recently read the "Learning to generate reviews and discovering sentiment" paper by OPENAI and found it to be super cool. But I could not understand how they are using the language model as feature extractor. Suppose we have 150 characters in a review, how do we extract features from these 150 characters when our input is 64 characters at a time.
Ratio Matching MMD Nets: Low dimensional projections for effective deep generative models
Srivastava, Akash, Xu, Kai, Gutmann, Michael U., Sutton, Charles
Deep generative models can learn to generate realistic-looking images on several natural image datasets, but many of the most effective methods are adversarial methods, which require careful balancing of training between a generator network and a discriminator network. Maximum mean discrepancy networks (MMD-nets) avoid this issue using the kernel trick, but unfortunately they have not on their own been able to match the performance of adversarial training. We present a new method of training MMD-nets, based on learning a mapping of samples from the data and from the model into a lower dimensional space, in which MMD training can be more effective. We call these networks ratio matching MMD networks (RM-MMDnets). We train the mapping to preserve density ratios between the densities over the low-dimensional space and the original space. This ensures that matching the model distribution to the data in the low-dimensional space will also match the original distributions. We show that RM-MMDnets have better performance and better stability than recent adversarial methods for training MMD-nets.
Deep Generative Models for Distribution-Preserving Lossy Compression
Tschannen, Michael, Agustsson, Eirikur, Lucic, Mario
We propose and study the problem of distribution-preserving lossy compression. Motivated by the recent advances in extreme image compression which allow to maintain artifact-free reconstructions even at very low bitrates, we propose to optimize the rate-distortion tradeoff under the constraint that the reconstructed samples follow the distribution of the training data. Such a compression system recovers both ends of the spectrum: On one hand, at zero bitrate it learns a generative model of the data, and at high enough bitrates it achieves perfect reconstruction. Furthermore, for intermediate bitrates it smoothly interpolates between matching the distribution of the training data and perfectly reconstructing the training samples. We study several methods to approximately solve the proposed optimization problem, including a novel combination of Wasserstein GAN and Wasserstein Autoencoder, and present strong theoretical and empirical results for the proposed compression system.