Generative AI
Discovering Types for Entity Disambiguation
Using the top solution from our type system optimization, we can now label data from Wikipedia using labels generated by the type system. Using this data (in our experiments, 400M tokens for each of English and French), we can now train a bidirectional LSTM to independently predict all the type memberships for each word. On the Wikipedia source text, we only have supervision on intra-wiki links, however this is sufficient to train a deep neural network to predict type membership with an F1 of over 0.91. One of our type systems, discovered by beam search, includes types such as Aviation, Clothing, and Games (as well as surprisingly specific ones like 1754 in Canada -- indicating 1754 was an exciting year in the dataset of 1,000 Wikipedia articles it was trained on); you can also view the full type system. Predicting entities in a document usually relies on a "coherence" metric between different entities, e.g.
Zero-Shot Learning via Class-Conditioned Deep Generative Models
Wang, Wenlin (Duke University) | Pu, Yunchen (Duke University) | Verma, Vinay Kumar (IIT Kanpur) | Fan, Kai (Duke University) | Zhang, Yizhe (Duke University) | Chen, Changyou (SUNY at Buffalo) | Rai, Piyush (IIT Kanpur) | Carin, Lawrence (Duke University)
We present a deep generative model for Zero-Shot Learning (ZSL). Unlike most existing methods for this problem, that represent each class as a point (via a semantic embedding), we represent each seen/unseen class using a class-specific latent-space distribution, conditioned on class attributes. We use these latent-space distributions as a prior for a supervised variational autoencoder (VAE), which also facilitates learning highly discriminative feature representations for the inputs. The entire framework is learned end-to-end using only the seen-class training data. At test time, the label for an unseen-class test input is the class that maximizes the VAE lower bound. We further extend the model to a (i) semi-supervised/transductive setting by leveraging unlabeled unseen-class data via an unsupervised learning module, and (ii) few-shot learning where we also have a small number of labeled inputs from the unseen classes. We compare our model with several state-of-the-art methods through a comprehensive set of experiments on a variety of benchmark data sets.
Semi-Supervised Learning From Crowds Using Deep Generative Models
Atarashi, Kyohei (Hokkaido University) | Oyama, Satoshi (Hokkaido University) | Kurihara, Masahito (RIKEN AIP)
Although supervised learning requires a labeled dataset, obtaining labels from experts is generally expensive. For this reason, crowdsourcing services are attracting attention in the field of machine learning as a way to collect labels at relatively low cost. However, the labels obtained by crowdsourcing, i.e., from non-expert workers, are often noisy. A number of methods have thus been devised for inferring true labels, and several methods have been proposed for learning classifiers directly from crowdsourced labels, referred to as "learning from crowds." A more practical problem is learning from crowdsourced labeled data and unlabeled data, i.e., "semi-supervised learning from crowds." This paper presents a novel generative model of the labeling process in crowdsourcing. It leverages unlabeled data effectively by introducing latent features and a data distribution. Because the data distribution can be complicated, we use a deep neural network for the data distribution. Therefore, our model can be regarded as a kind of deep generative model. The problems caused by the intractability of latent variable posteriors is solved by introducing an inference model. The experiments show that it outperforms four existing models, including a baseline model, on the MNIST dataset with simulated workers and the Rotten Tomatoes movie review dataset with Amazon Mechanical Turk workers.
Metrics for Deep Generative Models
Chen, Nutan, Klushyn, Alexej, Kurle, Richard, Jiang, Xueyan, Bayer, Justin, van der Smagt, Patrick
Neural samplers such as variational autoencoders (VAEs) or generative adversarial networks (GANs) approximate distributions by transforming samples from a simple random source---the latent space---to samples from a more complex distribution represented by a dataset. While the manifold hypothesis implies that the density induced by a dataset contains large regions of low density, the training criterions of VAEs and GANs will make the latent space densely covered. Consequently points that are separated by low-density regions in observation space will be pushed together in latent space, making stationary distances poor proxies for similarity. We transfer ideas from Riemannian geometry to this setting, letting the distance between two points be the shortest path on a Riemannian manifold induced by the transformation. The method yields a principled distance measure, provides a tool for visual inspection of deep generative models, and an alternative to linear interpolation in latent space. In addition, it can be applied for robot movement generalization using previously learned skills. The method is evaluated on a synthetic dataset with known ground truth; on a simulated robot arm dataset; on human motion capture data; and on a generative model of handwritten digits.
Latent Space Oddity: on the Curvature of Deep Generative Models
Arvanitidis, Georgios, Hansen, Lars Kai, Hauberg, Sรธren
Deep generative models provide a systematic way to learn nonlinear data distributions through a set of latent variables and a nonlinear "generator" function that maps latent points into the input space. The nonlinearity of the generator implies that the latent space gives a distorted view of the input space. Under mild conditions, we show that this distortion can be characterized by a stochastic Riemannian metric, and we demonstrate that distances and interpolants are significantly improved under this metric. This in turn improves probability distributions, sampling algorithms and clustering in the latent space. Our geometric analysis further reveals that current generators provide poor variance estimates and we propose a new generator architecture with vastly improved variance estimates. Results are demonstrated on convolutional and fully connected variational autoencoders, but the formalism easily generalizes to other deep generative models.
OpenAI masters scale with Kubernetes on Microsoft Azure
OpenAI's mission is to build safe artificial general intelligence (AGI) and ensure AGI's benefits are as widely and evenly distributed as possible. As a non-profit AI research company, they focus on long-term research, working on problems that require fundamental advances in AI capabilities. OpenAI runs Kubernetes for their deep learning research because Kubernetes can provide a fast iteration cycle, scalability, and a lack of boilerplate, which makes it ideal for most of OpenAI's experiments. They currently operate several Kubernetes clusters (some in the cloud and some on physical hardware), the largest of which they pushed to over 2,500 nodes. Their Kubernetes cluster runs in Azure on a combination of D15v2 and NC24 VMs.
Scaling Kubernetes to 2,500 Nodes
We've been running Kubernetes for deep learning research for over two years. While our largest-scale workloads manage bare cloud VMs directly, Kubernetes provides a fast iteration cycle, reasonable scalability, and a lack of boilerplate which makes it ideal for most of our experiments. We now operate several Kubernetes clusters (some in the cloud and some on physical hardware), the largest of which we've pushed to over 2,500 nodes. This cluster runs in Azure on a combination of D15v2 and NC24 VMs. On the path to this scale, many system components caused breakages, including etcd, the Kube masters, Docker image pulls, network, KubeDNS, and even our machines' ARP caches.
The Data-Driven Weekly #1.6
Right on cue, this past week heralded in an announcement of OpenAI, a new non-profit started by a number of tech luminaries to spearhead AI research that is publicly accessible. The motivation is that apparently these scions of capitalism lose faith in Adam Smith's invisible hand when it comes to AI R&D. Musk continues to promote the idea that AI will be humanity's largest existential threat. Challenging this view, the HBR asks if "OpenAI [is] Solving the Wrong Problem", pointing to the implied lack of trust in capitalism. This is similar to my own parry: that the biggest existential threat to humanity is humanity.
uber-common/deep-neuroevolution
Our code is based off of code from OpenAI, who we thank. The original code and related paper from OpenAI can be found here. The repo has been modified to run both ES and our algorithms, including our Deep Genetic Algorithm (DeepGA) locally and on AWS. Note: The Humanoid experiment depends on Mujoco. If you plan to use the mujoco env, make sure to follow mujoco-py's readme about how to install mujoco correctly The extra folder holds the XML specification file for the Humanoid Locomotion with Deceptive Trap domain used in https://arxiv.org/abs/1712.06560.