Generative AI
Here's what Elon Musk's secretive AI company is working on
Elon Musk has not been shy about his concerns over artificial intelligence turning evil. So it wasn't a surprise in December when Musk announced the formation of OpenAI, an open-source, non-profit focused on advancing "digital intelligence in the way that is most likely to benefit humanity as a whole." That's all well and good, but not much has been revealed about what exactly OpenAI is working on. OpenAI's co-founder and CTO told Tech Insider that OpenAI is primarily focusing on advancing machine learning, which is the technology that enables computers to learn how to complete tasks through experience. Specifically, the company is focusing on two key types of machine learning that every major tech company is investing in right now.
OpenAI, Hyperscalers See GPU Accelerated Future for Deep Learning
As a former research scientist at Google, Ian Goodfellow has had a direct hand in some of the more complex, promising frameworks set to power the future of deep learning in coming years. He spent his first years at the search giant chipping away at TensorFlow, creating new capabilities, including the creation of a new element to the deep learning stack, called generative adversarial networks. And as part of the Google Brain team, he furthered this work and continued to optimize machine learning algorithms used by Google and now, the wider world. Goodfellow has since moved on to the non-profit OpenAI company, where he is further refining what might be possible with generative adversarial networks. The mission of OpenAI is to develop open source tools to further many of the application areas that were showcased this week at the the GPU Technology Conference this week in San Jose, where the emphasis was placed squarely on the future of deep learning, and of course, the role that Nvidia's accelerators will play in the training and execution of neural networks and other machine learning.
OpenAI hires a bunch of variational dudes. โข /r/MachineLearning
There's a wide class of generative models for which variational methods are the only known practical way to do inference. This includes basically any model with black-box ("neural") dependence relations, and many others as well, e.g., Bayesian nonparametrics for any significant dataset size. The point of variational methods is not to calculate partition functions (although you do get that as a side effect); the point is to fit sophisticated models that have complex latent structure. Which does yield improvements across pretty much any metric you'd care about.
Note on the equivalence of hierarchical variational models and auxiliary deep generative models
This note compares two recently published machine learning methods for constructing flexible, but tractable families of variational hidden-variable posteriors. The first method, called "hierarchical variational models" enriches the inference model with an extra variable, while the other, called "auxiliary deep generative models", enriches the generative model instead. We conclude that the two methods are mathematically equivalent.
Max-Margin Deep Generative Models
Li, Chongxuan, Zhu, Jun, Shi, Tianlin, Zhang, Bo
Deep generative models (DGMs) are effective on learning multilayered representations of complex data and performing inference of input data by exploring the generative ability. However, little work has been done on examining or empowering the discriminative ability of DGMs on making accurate predictions. This paper presents max-margin deep generative models (mmDGMs), which explore the strongly discriminative principle of max-margin learning to improve the discriminative power of DGMs, while retaining the generative capability. We develop an efficient doubly stochastic subgradient algorithm for the piecewise linear objective. Empirical results on MNIST and SVHN datasets demonstrate that (1) max-margin learning can significantly improve the prediction performance of DGMs and meanwhile retain the generative ability; and (2) mmDGMs are competitive to the state-of-the-art fully discriminative networks by employing deep convolutional neural networks (CNNs) as both recognition and generative models.
Automatic Relevance Determination For Deep Generative Models
Karaletsos, Theofanis, Rรคtsch, Gunnar
A recurring problem when building probabilistic latent variable models is regularization and model selection, for instance, the choice of the dimensionality of the latent space. In the context of belief networks with latent variables, this problem has been adressed with Automatic Relevance Determination (ARD) employing Monte Carlo inference. We present a variational inference approach to ARD for Deep Generative Models using doubly stochastic variational inference to provide fast and scalable learning. We show empirical results on a standard dataset illustrating the effects of contracting the latent space automatically. We show that the resulting latent representations are significantly more compact without loss of expressive power of the learned models.
On the Expressive Efficiency of Sum Product Networks
Martens, James, Medabalimi, Venkatesh
Sum Product Networks (SPNs) are a recently developed class of deep generative models which compute their associated unnormalized density functions using a special type of arithmetic circuit. When certain sufficient conditions, called the decomposability and completeness conditions (or "D&C" conditions), are imposed on the structure of these circuits, marginal densities and other useful quantities, which are typically intractable for other deep generative models, can be computed by what amounts to a single evaluation of the network (which is a property known as "validity"). However, the effect that the D&C conditions have on the capabilities of D&C SPNs is not well understood. In this work we analyze the D&C conditions, expose the various connections that D&C SPNs have with multilinear arithmetic circuits, and consider the question of how well they can capture various distributions as a function of their size and depth. Among our various contributions is a result which establishes the existence of a relatively simple distribution with fully tractable marginal densities which cannot be efficiently captured by D&C SPNs of any depth, but which can be efficiently captured by various other deep generative models. We also show that with each additional layer of depth permitted, the set of distributions which can be efficiently captured by D&C SPNs grows in size. This kind of "depth hierarchy" property has been widely conjectured to hold for various deep models, but has never been proven for any of them. Some of our other contributions include a new characterization of the D&C conditions as sufficient and necessary ones for a slightly strengthened notion of validity, and various state-machine characterizations of the types of computations that can be performed efficiently by D&C SPNs.
Semi-supervised Learning with Deep Generative Models
Kingma, Durk P., Mohamed, Shakir, Rezende, Danilo Jimenez, Welling, Max
The ever-increasing size of modern data sets combined with the difficulty of obtaining label information has made semi-supervised learning one of the problems of significant practical importance in modern data analysis. We revisit the approach to semi-supervised learning with generative models and develop new models that allow for effective generalisation from small labelled data sets to large unlabelled ones. Generative approaches have thus far been either inflexible, inefficient or non-scalable. We show that deep generative models and approximate Bayesian inference exploiting recent advances in variational methods can be used to provide significant improvements, making generative approaches highly competitive for semi-supervised learning.
Semi-Supervised Learning with Deep Generative Models
Kingma, Diederik P., Rezende, Danilo J., Mohamed, Shakir, Welling, Max
The ever-increasing size of modern data sets combined with the difficulty of obtaining label information has made semi-supervised learning one of the problems of significant practical importance in modern data analysis. We revisit the approach to semi-supervised learning with generative models and develop new models that allow for effective generalisation from small labelled data sets to large unlabelled ones. Generative approaches have thus far been either inflexible, inefficient or non-scalable. We show that deep generative models and approximate Bayesian inference exploiting recent advances in variational methods can be used to provide significant improvements, making generative approaches highly competitive for semi-supervised learning.
Stochastic Backpropagation and Approximate Inference in Deep Generative Models
Rezende, Danilo Jimenez, Mohamed, Shakir, Wierstra, Daan
We marry ideas from deep neural networks and approximate Bayesian inference to derive a generalised class of deep, directed generative models, endowed with a new algorithm for scalable inference and learning. Our algorithm introduces a recognition model to represent approximate posterior distributions, and that acts as a stochastic encoder of the data. We develop stochastic back-propagation -- rules for back-propagation through stochastic variables -- and use this to develop an algorithm that allows for joint optimisation of the parameters of both the generative and recognition model. We demonstrate on several real-world data sets that the model generates realistic samples, provides accurate imputations of missing data and is a useful tool for high-dimensional data visualisation.