Generative AI
Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images
Watter, Manuel, Springenberg, Jost, Boedecker, Joschka, Riedmiller, Martin
We introduce Embed to Control (E2C), a method for model learning and control of non-linear dynamical systems from raw pixel images. E2C consists of a deep generative model, belonging to the family of variational autoencoders, that learns to generate image trajectories from a latent space in which the dynamics is constrained to be locally linear. Our model is derived directly from an optimal control formulation in latent space, supports long-term prediction of image sequences and exhibits strong performance on a variety of complex control problems. Papers published at the Neural Information Processing Systems Conference.
Conditional Generative Moment-Matching Networks
Ren, Yong, Zhu, Jun, Li, Jialian, Luo, Yucen
Maximum mean discrepancy (MMD) has been successfully applied to learn deep generative models for characterizing a joint distribution of variables via kernel mean embedding. In this paper, we present conditional generative moment-matching networks (CGMMN), which learn a conditional distribution given some input variables based on a conditional maximum mean discrepancy (CMMD) criterion. The learning is performed by stochastic gradient descent with the gradient calculated by back-propagation. We evaluate CGMMN on a wide range of tasks, including predictive modeling, contextual generation, and Bayesian dark knowledge, which distills knowledge from a Bayesian model by learning a relatively small CGMMN student network. Our results demonstrate competitive performance in all the tasks.
Semi-crowdsourced Clustering with Deep Generative Models
Luo, Yucen, TIAN, TIAN, Shi, Jiaxin, Zhu, Jun, Zhang, Bo
We consider the semi-supervised clustering problem where crowdsourcing provides noisy information about the pairwise comparisons on a small subset of data, i.e., whether a sample pair is in the same cluster. We propose a new approach that includes a deep generative model (DGM) to characterize low-level features of the data, and a statistical relational model for noisy pairwise annotations on its subset. The two parts share the latent variables. To make the model automatically trade-off between its complexity and fitting data, we also develop its fully Bayesian variant. The challenge of inference is addressed by fast (natural-gradient) stochastic variational inference algorithms, where we effectively combine variational message passing for the relational part and amortized learning of the DGM under a unified framework.
Continual Learning with Deep Generative Replay
Shin, Hanul, Lee, Jung Kwon, Kim, Jaehong, Kim, Jiwon
Attempts to train a comprehensive artificial intelligence capable of solving multiple tasks have been impeded by a chronic problem called catastrophic forgetting. Although simply replaying all previous data alleviates the problem, it requires large memory and even worse, often infeasible in real world applications where the access to past data is limited. Inspired by the generative nature of the hippocampus as a short-term memory system in primate brain, we propose the Deep Generative Replay, a novel framework with a cooperative dual model architecture consisting of a deep generative model ("generator") and a task solving model ("solver"). With only these two models, training data for previous tasks can easily be sampled and interleaved with those for a new task. We test our methods in several sequential learning settings involving image classification tasks.
Stabilizing Training of Generative Adversarial Networks through Regularization
Roth, Kevin, Lucchi, Aurelien, Nowozin, Sebastian, Hofmann, Thomas
Deep generative models based on Generative Adversarial Networks (GANs) have demonstrated impressive sample quality but in order to work they require a careful choice of architecture, parameter initialization, and selection of hyper-parameters. This fragility is in part due to a dimensional mismatch or non-overlapping support between the model distribution and the data distribution, causing their density ratio and the associated f -divergence to be undefined. We overcome this fundamental limitation and propose a new regularization approach with low computational cost that yields a stable GAN training procedure. We demonstrate the effectiveness of this regularizer accross several architectures trained on common benchmark image generation tasks. Our regularization turns GAN models into reliable building blocks for deep learning.
Learning semantic similarity in a continuous space
We address the problem of learning semantic representation of questions to measure similarity between pairs as a continuous distance metric. Our work naturally extends Word Mover's Distance (WMD) [1] by representing text documents as normal distributions instead of bags of embedded words. Our learned metric measures the dissimilarity between two questions as the minimum amount of distance the intent (hidden representation) of one question needs to "travel" to match the intent of another question. We first learn to repeat, reformulate questions to infer intents as normal distributions with a deep generative model [2] (variational auto encoder). Semantic similarity between pairs is then learned discriminatively as an optimal transport distance metric (Wasserstein 2) with our novel variational siamese framework.
Deepfakes and deep media: A new security battleground
That's troubling not only because these fakes might be used to sway opinions during an election or implicate a person in a crime, but because they've already been abused to generate pornographic material of actors and defraud a major energy producer. In anticipation of this new reality, a coalition of academic institutions, tech firms, and nonprofits are developing ways to spot misleading AI-generated media. Their work suggests that detection tools are a viable short-term solution but that the deepfake arms race is just beginning. The best AI-produced prose used to be closer to Mad Libs than The Grapes of Wrath, but cutting-edge language models can now write with humanlike pith and cogency. San Francisco research firm OpenAI's GPT-2 takes seconds to craft passages in the style of a New Yorker article or brainstorm game scenarios.
Deep S$^3$PR: Simultaneous Source Separation and Phase Retrieval Using Deep Generative Models
Metzler, Christopher A., Wetzstein, Gordon
This paper introduces and solves the simultaneous source separation and phase retrieval (S$^3$PR) problem. S$^3$PR shows up in a number application domains, most notably computational optics, where one has multiple independent coherent sources whose phase is difficult to measure. In general, S$^3$PR is highly under-determined, non-convex, and difficult to solve. In this work, we demonstrate that by restricting the solutions to lie in the range of a deep generative model, we can constrain the search space sufficiently to solve S$^3$PR.
Google's New ML Fairness Gym To Track Down Bias In AI
Human societies are extremely complex. The cultural, racial and geographical differences around the globe and the lack of curated data make'fairness' in technology a huge challenge. Now, in an attempt to track the long term societal impacts of artificial intelligence, Google researchers recently released a machine learning fairness gym. They have done this by using Google's OpenAI Gym. OpenAI's Gym is a toolkit for developing and comparing reinforcement learning algorithms and is compatible with any numerical computation library, such as TensorFlow or Theano.
Out-of-Distribution Detection with Distance Guarantee in Deep Generative Models
Zhang, Yufeng, Liu, Wanwei, Chen, Zhenbang, Wang, Ji, Liu, Zhiming, Li, Kenli, Wei, Hongmei, Chen, Zuoning
Recent research has shown that it is challenging to detect out-of-distribution (OOD) data in deep generative models including flow-based models and variational autoencoders (VAEs). In this paper, we prove a theorem that, for a well-trained flow-based model, the distance between the distribution of representations of an OOD dataset and prior can be large enough, as long as the distance between the distributions of the training dataset and the OOD dataset is large enough. Furthermore, our observation shows that, for flow-based model and VAE with factorized prior, the representations of OOD datasets are more correlated than that of the training dataset. Based on our theorem and observation, we propose detecting OOD data according to the total correlation of representations in flow-based model and VAE. Experimental results show that our method can achieve nearly 100\% AUROC for all the widely used benchmarks and has robustness against data manipulation. While the state-of-the-art method performs not better than random guessing for challenging problems and can be fooled by data manipulation in almost all cases.