stochastic unit
Sequential Neural Models with Stochastic Layers
Marco Fraccaro, Søren Kaae Sønderby, Ulrich Paquet, Ole Winther
This paper introduces stochastic recurrent neural networks which glue a deterministic recurrent neural network and a state space model together to form a stochastic and sequential neural generative model. The clear separation of deterministic and stochastic layers allows a structured variational inference network to track the factorization of the model's posterior distribution. By retaining both the nonlinear recursive structure of a recurrent neural network and averaging over the uncertainty in a latent path, like a state space model, we improve the state of the art results on the Blizzard and TIMIT speech modeling data sets by a large margin, while achieving comparable performances to competing methods on polyphonic music modeling.
d81f9c1be2e08964bf9f24b15f0e4900-Reviews.html
This paper proposes a neural network architecture that falls somewhere between multilayer perceptrons (MLPs) and sigmoid belief networks (SBNs). The motivation is to permit multimodal predictive distributions (like SBNs) by using stochastic hidden units, but adds deterministic hidden units to smooth the predictive distribution in the case of real-valued data. The paper's main technical contribution is an EM-style algorithm where the E-step uses importance sampling to approximate the posterior and the M-step uses backpropagation to update the parameters. The experiments demonstrate the model's utility on several synthetic and real datasets. Quality: I liked this paper; the use of stochastic and deterministic units seems reasonably justified.
Sequential Neural Models with Stochastic Layers Marco Fraccaro Søren Kaae Sønderby
This paper introduces stochastic recurrent neural networks which glue a deterministic recurrent neural network and a state space model together to form a stochastic and sequential neural generative model. The clear separation of deterministic and stochastic layers allows a structured variational inference network to track the factorization of the model's posterior distribution. By retaining both the nonlinear recursive structure of a recurrent neural network and averaging over the uncertainty in a latent path, like a state space model, we improve the state of the art results on the Blizzard and TIMIT speech modeling data sets by a large margin, while achieving comparable performances to competing methods on polyphonic music modeling.
An Alternative to Backpropagation in Deep Reinforcement Learning
State-of-the-art deep learning algorithms mostly rely on gradient backpropagation to train a deep artificial neural network, which is generally regarded to be biologically implausible. For a network of stochastic units trained on a reinforcement learning task or a supervised learning task, one biologically plausible way of learning is to train each unit by REINFORCE. In this case, only a global reward signal has to be broadcast to all units, and the learning rule given is local, which can be interpreted as reward-modulated spike-timing-dependent plasticity (R-STDP) that is observed biologically. Although this learning rule follows the gradient of return in expectation, it suffers from high variance and cannot be used to train a deep network in practice. In this paper, we propose an algorithm called MAP propagation that can reduce this variance significantly while retaining the local property of learning rule. Different from prior works on local learning rules (e.g. Contrastive Divergence) which mostly applies to undirected models in unsupervised learning tasks, our proposed algorithm applies to directed models in reinforcement learning tasks. We show that the newly proposed algorithm can solve common reinforcement learning tasks at a speed similar to that of backpropagation when applied to an actor-critic network.
Augmented Normalizing Flows: Bridging the Gap Between Generative Flows and Latent Variable Models
Huang, Chin-Wei, Dinh, Laurent, Courville, Aaron
In this work, we propose a new family of generative flows on an augmented data space, with an aim to improve expressivity without drastically increasing the computational cost of sampling and evaluation of a lower bound on the likelihood. Theoretically, we prove the proposed flow can approximate a Hamiltonian ODE as a universal transport map. Empirically, we demonstrate state-of-the-art performance on standard benchmarks of flow-based generative modeling.
Semi-Supervised Generation with Cluster-aware Generative Models
Maaløe, Lars, Fraccaro, Marco, Winther, Ole
Deep generative models trained with large amounts of unlabelled data have proven to be powerful within the domain of unsupervised learning. Many real life data sets contain a small amount of labelled data points, that are typically disregarded when training generative models. We propose the Cluster-aware Generative Model, that uses unlabelled information to infer a latent representation that models the natural clustering of the data, and additional labelled data points to refine this clustering. The generative performances of the model significantly improve when labelled information is exploited, obtaining a log-likelihood of -79.38 nats on permutation invariant MNIST, while also achieving competitive semi-supervised classification accuracies. The model can also be trained fully unsupervised, and still improve the log-likelihood performance with respect to related methods.
Sequential Neural Models with Stochastic Layers
Fraccaro, Marco, Sønderby, Søren Kaae, Paquet, Ulrich, Winther, Ole
How can we efficiently propagate uncertainty in a latent state representation with recurrent neural networks? This paper introduces stochastic recurrent neural networks which glue a deterministic recurrent neural network and a state space model together to form a stochastic and sequential neural generative model. The clear separation of deterministic and stochastic layers allows a structured variational inference network to track the factorization of the model’s posterior distribution. By retaining both the nonlinear recursive structure of a recurrent neural network and averaging over the uncertainty in a latent path, like a state space model, we improve the state of the art results on the Blizzard and TIMIT speech modeling data sets by a large margin, while achieving comparable performances to competing methods on polyphonic music modeling.
Sequential Neural Models with Stochastic Layers
Fraccaro, Marco, Sønderby, Søren Kaae, Paquet, Ulrich, Winther, Ole
How can we efficiently propagate uncertainty in a latent state representation with recurrent neural networks? This paper introduces stochastic recurrent neural networks which glue a deterministic recurrent neural network and a state space model together to form a stochastic and sequential neural generative model. The clear separation of deterministic and stochastic layers allows a structured variational inference network to track the factorization of the model's posterior distribution. By retaining both the nonlinear recursive structure of a recurrent neural network and averaging over the uncertainty in a latent path, like a state space model, we improve the state of the art results on the Blizzard and TIMIT speech modeling data sets by a large margin, while achieving comparable performances to competing methods on polyphonic music modeling.
Stochastic Neural Networks with Monotonic Activation Functions
Ravanbakhsh, Siamak, Poczos, Barnabas, Schneider, Jeff, Schuurmans, Dale, Greiner, Russell
Siamak Ravanbakhsh, Barnab as P oczos, Jeff Schneider 1 and Dale Schuurmans, Russell Greiner 2 1 Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA 15213 2 University of Alberta, Edmonton, AB T6G 2E8, Canada Abstract We propose a Laplace approximation that creates a stochastic unit from any smooth monotonic activation function, using only Gaussian noise. This paper investigates the application of this stochastic approximation in training a family of Restricted Boltzmann Machines (RBM) that are closely linked to Bregman divergences. This family, that we call exponential family RBM (Exp-RBM), is a subset of the exponential family Harmoniums that expresses family members through a choice of smooth monotonic non-linearity for each neuron. Using contrastive divergence along with our Gaussian approximation, we show that Exp-RBM can learn useful representations using novel stochastic units. 1 Introduction Deep neural networks (LeCun et al., 2015; Bengio, 2009) have produced some of the best results in complex pattern recognition tasks where the training data is abundant. Here, we are interested in deep learning for generative modeling. Recent years has witnessed a surge of interest in directed generative models that are trained using (stochastic) back-propagation ( e.g., Kingma and Welling, 2013; Rezende et al., 2014; Goodfellow et al., 2014). These models are distinct from deep energy-based models - including deep Boltzmann machine (Hinton et al., 2006) and (convolutional) deep belief networkAppearing in Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (AISTATS) 2016, Cadiz, Spain. Although, due to their use of Gaussian noise, the stochastic units that we introduce in this paper can be potentially used with stochastic back-propagation, this paper is limited to applications in RBM.
Techniques for Learning Binary Stochastic Feedforward Neural Networks
Raiko, Tapani, Berglund, Mathias, Alain, Guillaume, Dinh, Laurent
Stochastic binary hidden units in a multi-layer perceptron (MLP) network give at least three potential benefits when compared to deterministic MLP networks. (1) They allow to learn one-to-many type of mappings. (2) They can be used in structured prediction problems, where modeling the internal structure of the output is important. (3) Stochasticity has been shown to be an excellent regularizer, which makes generalization performance potentially better in general. However, training stochastic networks is considerably more difficult. We study training using M samples of hidden activations per input. We show that the case M=1 leads to a fundamentally different behavior where the network tries to avoid stochasticity. We propose two new estimators for the training gradient and propose benchmark tests for comparing training algorithms. Our experiments confirm that training stochastic networks is difficult and show that the proposed two estimators perform favorably among all the five known estimators.