Goto

Collaborating Authors

 Deep Learning


Pushing Stochastic Gradient towards Second-Order Methods -- Backpropagation Learning with Transformations in Nonlinearities

arXiv.org Machine Learning

Recently, we proposed to transform the outputs of each hidden neuron in a multi-layer perceptron network to have zero output and zero slope on average, and use separate shortcut connections to model the linear dependencies instead. We continue the work by firstly introducing a third transformation to normalize the scale of the outputs of each hidden neuron, and secondly by analyzing the connections to second order optimization methods. We show that the transformations make a simple stochastic gradient behave closer to second-order optimization methods and thus speed up learning. This is shown both in theory and with experiments. The experiments on the third transformation show that while it further increases the speed of learning, it can also hurt performance by converging to a worse local optimum, where both the inputs and outputs of many hidden neurons are close to zero.


Boltzmann Machines and Denoising Autoencoders for Image Denoising

arXiv.org Machine Learning

Image denoising based on a probabilistic model of local image patches has been employed by various researchers, and recently a deep (denoising) autoencoder has been proposed by Burger et al. [2012] and Xie et al. [2012] as a good model for this. In this paper, we propose that another popular family of models in the field of deep learning, called Boltzmann machines, can perform image denoising as well as, or in certain cases of high level of noise, better than denoising autoencoders. We empirically evaluate the two models on three different sets of images with different types and levels of noise. Throughout the experiments we also examine the effect of the depth of the models. The experiments confirmed our claim and revealed that the performance can be improved by adding more hidden layers, especially when the level of noise is high.


Denoising Deep Neural Networks Based Voice Activity Detection

arXiv.org Machine Learning

Recently, the deep-belief-networks (DBN) based voice activity detection (VAD) has been proposed. It is powerful in fusing the advantages of multiple features, and achieves the state-of-the-art performance. However, the deep layers of the DBN-based VAD do not show an apparent superiority to the shallower layers. In this paper, we propose a denoising-deep-neural-network (DDNN) based VAD to address the aforementioned problem. Specifically, we pre-train a deep neural network in a special unsupervised denoising greedy layer-wise mode, and then fine-tune the whole network in a supervised way by the common back-propagation algorithm. In the pre-training phase, we take the noisy speech signals as the visible layer and try to extract a new feature that minimizes the reconstruction cross-entropy loss between the noisy speech signals and its corresponding clean speech signals. Experimental results show that the proposed DDNN-based VAD not only outperforms the DBN-based VAD but also shows an apparent performance improvement of the deep layers over shallower layers.


Sparse Penalty in Deep Belief Networks: Using the Mixed Norm Constraint

arXiv.org Machine Learning

Deep Belief Networks (DBN) have been successfully applied on popular machine learning tasks. Specifically, when applied on hand-written digit recognition, DBNs have achieved approximate accuracy rates of 98.8%. In an effort to optimize the data representation achieved by the DBN and maximize their descriptive power, recent advances have focused on inducing sparse constraints at each layer of the DBN. In this paper we present a theoretical approach for sparse constraints in the DBN using the mixed norm for both non-overlapping and overlapping groups. We explore how these constraints affect the classification accuracy for digit recognition in three different datasets (MNIST, USPS, RIMES) and provide initial estimations of their usefulness by altering different parameters such as the group size and overlap percentage.


High-Dimensional Probability Estimation with Deep Density Models

arXiv.org Machine Learning

One of the fundamental problems in machine learning is the estimation of a probability distribution from data. Many techniques have been proposed to study the structure of data, most often building around the assumption that observations lie on a lower-dimensional manifold of high probability. It has been more difficult, however, to exploit this insight to build explicit, tractable density models for high-dimensional data. In this paper, we introduce the deep density model (DDM), a new approach to density estimation. We exploit insights from deep learning to construct a bijective map to a representation space, under which the transformation of the distribution of the data is approximately factorized and has identical and known marginal densities. The simplicity of the latent distribution under the model allows us to feasibly explore it, and the invertibility of the map to characterize contraction of measure across it. This enables us to compute normalized densities for out-of-sample data. This combination of tractability and flexibility allows us to tackle a variety of probabilistic tasks on high-dimensional datasets, including: rapid computation of normalized densities at test-time without evaluating a partition function; generation of samples without MCMC; and characterization of the joint entropy of the data.


Layer-wise learning of deep generative models

arXiv.org Machine Learning

When using deep, multi-layered architectures to build generative models of data, it is difficult to train all layers at once. We propose a layer-wise training procedure admitting a performance guarantee compared to the global optimum. It is based on an optimistic proxy of future performance, the best latent marginal. We interpret auto-encoders in this setting as generative models, by showing that they train a lower bound of this criterion. We test the new learning procedure against a state of the art method (stacked RBMs), and find it to improve performance. Both theory and experiments highlight the importance, when training deep architectures, of using an inference model (from data to hidden variables) richer than the generative model (from hidden variables to data).


Fast Image Scanning with Deep Max-Pooling Convolutional Neural Networks

arXiv.org Artificial Intelligence

Deep Max-Pooling Convolutional Neural Networks are Deep Neural Networks (DNN) with convolutional and max-pooling layers. Convolutional Neural Networks (CNN) can be traced back to the Neocognitron [1] in 1980. They were first successfully applied to relatively small tasks such as digit recognition [2], image interpretation [3] and object recognition [4]. Back then their size was greatly limited by the low computational power of available hardware. Since 2010, however, DNN have greatly profited from Graphics Processing Units (GPU). Simple GPU-based multilayer perceptrons (MLP) establised new state of the art results [5] on the MNIST handwritten digit dataset [4] when made both deep and large (augmenting the training set by artificial samples helped to avoid overfitting).


Linear-Nonlinear-Poisson Neuron Networks Perform Bayesian Inference On Boltzmann Machines

arXiv.org Artificial Intelligence

One conjecture in both deep learning and classical connectionist viewpoint is that the biological brain implements certain kinds of deep networks as its back-end. However, to our knowledge, a detailed correspondence has not yet been set up, which is important if we want to bridge between neuroscience and machine learning. Recent researches emphasized the biological plausibility of Linear-Nonlinear-Poisson (LNP) neuron model. We show that with neurally plausible settings, the whole network is capable of representing any Boltzmann machine and performing a semi-stochastic Bayesian inference algorithm lying between Gibbs sampling and variational inference.


Stochastic Pooling for Regularization of Deep Convolutional Neural Networks

arXiv.org Machine Learning

We introduce a simple and effective method for regularizing large convolutional neural networks. We replace the conventional deterministic pooling operations with a stochastic procedure, randomly picking the activation within each pooling region according to a multinomial distribution, given by the activities within the pooling region. The approach is hyper-parameter free and can be combined with other regularization approaches, such as dropout and data augmentation. We achieve state-of-the-art performance on four image datasets, relative to other approaches that do not utilize data augmentation.


Recklessly Approximate Sparse Coding

arXiv.org Machine Learning

It has recently been observed that certain extremely simple feature encoding techniques are able to achieve state of the art performance on several standard image classification benchmarks including deep belief networks, convolutional nets, factored RBMs, mcRBMs, convolutional RBMs, sparse autoencoders and several others. Moreover, these "triangle" or "soft threshold" encodings are ex- tremely efficient to compute. Several intuitive arguments have been put forward to explain this remarkable performance, yet no mathematical justification has been offered. The main result of this report is to show that these features are realized as an approximate solution to the a non-negative sparse coding problem. Using this connection we describe several variants of the soft threshold features and demonstrate their effectiveness on two image classification benchmark tasks.