Goto

Collaborating Authors

 Deep Learning


Neural Variational Inference and Learning in Belief Networks

arXiv.org Machine Learning

Highly expressive directed latent variable models, such as sigmoid belief networks, are difficult to train on large datasets because exact inference in them is intractable and none of the approximate inference methods that have been applied to them scale well. We propose a fast non-iterative approximate inference method that uses a feedforward network to implement efficient exact sampling from the variational posterior. The model and this inference network are trained jointly by maximizing a variational lower bound on the log-likelihood. Although the naive estimator of the inference network gradient is too high-variance to be useful, we make it practical by applying several straightforward modelindependent variance reduction techniques. Applying our approach to training sigmoid belief networks and deep autoregressive networks, we show that it outperforms the wake-sleep algorithm on MNIST and achieves state-of-the-art results on the Reuters RCV1 document dataset.


Stochastic Backpropagation and Approximate Inference in Deep Generative Models

arXiv.org Artificial Intelligence

We marry ideas from deep neural networks and approximate Bayesian inference to derive a generalised class of deep, directed generative models, endowed with a new algorithm for scalable inference and learning. Our algorithm introduces a recognition model to represent an approximate posterior distribution and uses this for optimisation of a variational lower bound. We develop stochastic back-propagation - rules for gradient backpropa-gation through stochastic variables - and derive an algorithm that allows for joint optimisation of the parameters of both the generative and recognition models. We demonstrate on several real-world data sets that by using stochastic backpropagation and variational inference, we obtain models that are able to generate realistic samples of data, allow for accurate imputations of missing data, and provide a useful tool for high-dimensional data visualisation.


Deep AutoRegressive Networks

arXiv.org Machine Learning

We introduce a deep, generative autoencoder capable of learning hierarchies of distributed representations from data. Successive deep stochastic hidden layers are equipped with autoregressive connections, which enable the model to be sampled from quickly and exactly via ancestral sampling. We derive an efficient approximate parameter estimation method based on the minimum description length (MDL) principle, which can be seen as maximising a variational lower bound on the log-likelihood, with a feedforward neural network implementing approximate inference. We demonstrate state-of-the-art generative performance on a number of classic data sets: several UCI data sets, MNIST and Atari 2600 games.


Sparse Matrix Factorization

arXiv.org Machine Learning

We investigate the problem of factorizing a matrix into several sparse matrices and propose an algorithm for this under randomness and sparsity assumptions. This problem can be viewed as a simplification of the deep learning problem where finding a factorization corresponds to finding edges in different layers and values of hidden units. We prove that under certain assumptions for a sparse linear deep network with $n$ nodes in each layer, our algorithm is able to recover the structure of the network and values of top layer hidden units for depths up to $\tilde O(n^{1/6})$. We further discuss the relation among sparse matrix factorization, deep learning, sparse recovery and dictionary learning.


Comparing Raw Data and Feature Extraction for Seizure Detection with Deep Learning Methods

AAAI Conferences

Personalized health monitoring is slowly becoming a reality due to advances in small, high-fidelity sensors, low-power processors, as well as energy harvesting techniques. The ability to efficiently and effectively process this data and extract useful information is of the utmost importance. In this paper, we aim at dealing with this challenge for the application of automated seizure detection. We explore the use of a variety of representations and machine learning algorithms to the particular task of seizure detection in high-resolution, multi-channel EEG data. In doing so, we explore the classification accuracy, computational complexity and memory requirements with a view toward understanding which approaches are most suitable. In particular, we show that layered learning approaches such as Deep Belief Networks excel along these dimensions.


Training Restricted Boltzmann Machine by Perturbation

arXiv.org Machine Learning

A new approach to maximum likelihood learning of discrete graphical models and RBM in particular is introduced. Our method, Perturb and Descend (PD) is inspired by two ideas (I) perturb and MAP method for sampling (II) learning by Contrastive Divergence minimization. In contrast to perturb and MAP, PD leverages training data to learn the models that do not allow efficient MAP estimation. During the learning, to produce a sample from the current model, we start from a training data and descend in the energy landscape of the "perturbed model", for a fixed number of steps, or until a local optima is reached. For RBM, this involves linear calculations and thresholding which can be very fast. Furthermore we show that the amount of perturbation is closely related to the temperature parameter and it can regularize the model by producing robust features resulting in sparse hidden layer activation.


How to Construct Deep Recurrent Neural Networks

arXiv.org Machine Learning

In this paper, we explore different ways to extend a recurrent neural network (RNN) to a \textit{deep} RNN. We start by arguing that the concept of depth in an RNN is not as clear as it is in feedforward neural networks. By carefully analyzing and understanding the architecture of an RNN, however, we find three points of an RNN which may be made deeper; (1) input-to-hidden function, (2) hidden-to-hidden transition and (3) hidden-to-output function. Based on this observation, we propose two novel architectures of a deep RNN which are orthogonal to an earlier attempt of stacking multiple recurrent layers to build a deep RNN (Schmidhuber, 1992; El Hihi and Bengio, 1996). We provide an alternative interpretation of these deep RNNs using a novel framework based on neural operators. The proposed deep RNNs are empirically evaluated on the tasks of polyphonic music prediction and language modeling. The experimental result supports our claim that the proposed deep RNNs benefit from the depth and outperform the conventional, shallow RNNs.


Discrete Restricted Boltzmann Machines

arXiv.org Machine Learning

A restricted Boltzmann machine (RBM) is a probabilistic graphical model with bipartite interactions between an observed set and a hidden set of units [see Smolensky, 1986, Freund and Haussler, 1991, Hinton, 2002, 2010]. A characterizing property of these models is that the observed units are independent given the states of the hidden units and vice versa. This is a consequence of the bipartiteness of the interaction graph and does not depend on the units' state spaces. Typically RBMs are defined with binary units, but other types of units have also been considered, including continuous, discrete, and mixed type units [see Welling et al., 2005, Marks and Movellan, 2001, Salakhutdinov et al., 2007, Dahl et al., 2012, Tran et al., 2011]. We study discrete RBMs, also called multinomial or softmax RBMs, which are special types of exponential family harmoniums [Welling et al., 2005].


A Deep Representation for Invariance And Music Classification

arXiv.org Machine Learning

Representations in the auditory cortex might be based on mechanisms similar to the visual ventral stream; modules for building invariance to transformations and multiple layers for compositionality and selectivity. In this paper we propose the use of such computational modules for extracting invariant and discriminative audio representations. Building on a theory of invariance in hierarchical architectures, we propose a novel, mid-level representation for acoustical signals, using the empirical distributions of projections on a set of templates and their transformations. Under the assumption that, by construction, this dictionary of templates is composed from similar classes, and samples the orbit of variance-inducing signal transformations (such as shift and scale), the resulting signature is theoretically guaranteed to be unique, invariant to transformations and stable to deformations. Modules of projection and pooling can then constitute layers of deep networks, for learning composite representations. We present the main theoretical and computational aspects of a framework for unsupervised learning of invariant audio representations, empirically evaluated on music genre classification. This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF - 1231216.


Neuronal Synchrony in Complex-Valued Deep Networks

arXiv.org Machine Learning

Deep learning has recently led to great successes in tasks such as image recognition (e.g Krizhevsky et al., 2012). However, deep networks are still outmatched by the power and versatility of the brain, perhaps in part due to the richer neuronal computations available to cortical circuits. The challenge is to identify which neuronal mechanisms are relevant, and to find suitable abstractions to model them. Here, we show how aspects of spike timing, long hypothesized to play a crucial role in cortical information processing, could be incorporated into deep networks to build richer, versatile representations. We introduce a neural network formulation based on complex-valued neuronal units that is not only biologically meaningful but also amenable to a variety of deep learning frameworks. Here, units are attributed both a firing rate and a phase, the latter indicating properties of spike timing. We show how this formulation qualitatively captures several aspects thought to be related to neuronal synchrony, including gating of information processing and dynamic binding of distributed object representations. Focusing on the latter, we demonstrate the potential of the approach in several simple experiments. Thus, neuronal synchrony could be a flexible mechanism that fulfills multiple functional roles in deep networks.