Deep Learning
Unsupervised feature learning for audio classification using convolutional deep belief networks
Lee, Honglak, Pham, Peter, Largman, Yan, Ng, Andrew Y.
In recent years, deep learning approaches have gained significant interest as a way of building hierarchical representations from unlabeled data. However, to our knowledge, these deep learning approaches have not been extensively studied for auditory data. In this paper, we apply convolutional deep belief networks to audio data and empirically evaluate them on various audio classification tasks. For the case of speech data, we show that the learned features correspond to phones/phonemes. In addition, our feature representations trained from unlabeled audio data show very good performance for multiple audio classification tasks. We hope that this paper will inspire more research on deep learning approaches applied to a wide range of audio recognition tasks.
Measuring Invariances in Deep Networks
Goodfellow, Ian, Lee, Honglak, Le, Quoc V., Saxe, Andrew, Ng, Andrew Y.
For many computer vision applications, the ideal image feature would be invariant to multiple confounding image properties, such as illumination and viewing angle. Recently, deep architectures trained in an unsupervised manner have been proposed as an automatic method for extracting useful features. However, outside of using these learning algorithms in a classi๏ฌer, they can be sometimes dif๏ฌcult to evaluate. In this paper, we propose a number of empirical tests that directly measure the degree to which these learned features are invariant to different image transforms. We ๏ฌnd that deep autoencoders become invariant to increasingly complex image transformations with depth. This further justi๏ฌes the use of โdeepโ vs. โshallowerโ representations. Our performance metrics agree with existing measures of invariance. Our evaluation metrics can also be used to evaluate future work in unsupervised deep learning, and thus help the development of future algorithms.
Kernel Methods for Deep Learning
Cho, Youngmin, Saul, Lawrence K.
We introduce a new family of positive-definite kernel functions that mimic the computation in large, multilayer neural nets. These kernel functions can be used in shallow architectures, such as support vector machines (SVMs), or in deep kernel-based architectures that we call multilayer kernel machines (MKMs). We evaluate SVMs and MKMs with these kernel functions on problems designed to illustrate the advantages of deep architectures. On several problems, we obtain better results than previous, leading benchmarks from both SVMs with Gaussian kernels as well as deep belief nets.
Slow, Decorrelated Features for Pretraining Complex Cell-like Networks
Bengio, Yoshua, Bergstra, James S.
We introduce a new type of neural network activation function based on recent physiological rate models for complex cells in visual area V1. A single-hidden-layer neural network of this kind of model achieves 1.5% error on MNIST. We also introduce an existing criterion for learning slow, decorrelated features as a pretraining strategy for image models. This pretraining strategy results in orientation-selective features, similar to the receptive fields of complex cells. With this pretraining, the same single-hidden-layer model achieves better generalization error, even though the pretraining sample distribution is very different from the fine-tuning distribution. To implement this pretraining strategy, we derive a fast algorithm for online learning of decorrelated features such that each iteration of the algorithm runs in linear time with respect to the number of features.
New Millennium AI and the Convergence of History
Artificial Intelligence (AI) has recently become a real formal science: the new millennium brought the first mathematically sound, asymptotically optimal, universal problem solvers, providing a new, rigorous foundation for the previously largely heuristic field of General AI and embedded agents. At the same time there has been rapid progress in practical methods for learning true sequence-processing programs, as opposed to traditional methods limited to stationary pattern association. Here we will briefly review some of the new results, and speculate about future developments, pointing out that the time intervals between the most notable events in over 40,000 years or 2^9 lifetimes of human history have sped up exponentially, apparently converging to zero within the next few decades. Or is this impression just a by-product of the way humans allocate memory space to past events?
DeSTIN: A Scalable Deep Learning Architecture with Application to High-Dimensional Robust Pattern Recognition
Arel, Itamar (The University of Tennessee) | Rose, Derek (The University of Tennessee) | Coop, Robert (The University of Tennessee)
The topic of deep learning systems has received significant attention during the past few years, particularly as a biologically-inspired approach to processing highdimensional signals. The latter often involve spatiotemporal information that may span large scales, rendering its representation in the general case highly challenging. Deep learning networks attempt to overcome this challenge by means of a hierarchical architecture that is comprised of common circuits with similar (and often cortically influenced) functionality. The goal of such systems is to represent sensory observations in a manner that will later facilitate robust pattern classification, mimicking a key attribute of the mammal brain. This stands in contrast with the mainstream approach of pre-processing the data so as to reduce its dimensionality โ a paradigm that often results in sub-optimal performance. This paper presents a Deep SpatioTemporal Inference Network (DeSTIN) โ a scalable deep learning architecture that relies on a combination of unsupervised learning and Bayesian inference. Dynamic pattern learning forms an inherent way of capturing complex spatiotemporal dependencies. Simulation results demonstrate the core capabilities of the proposed framework, particularly in the context of high-dimensional signal classification.
Large-Margin kNN Classification Using a Deep Encoder Network
Min, Martin Renqiang, Stanley, David A., Yuan, Zineng, Bonner, Anthony, Zhang, Zhaolei
KNN is one of the most popular classification methods, but it often fails to work well with inappropriate choice of distance metric or due to the presence of numerous class-irrelevant features. Linear feature transformation methods have been widely applied to extract class-relevant information to improve kNN classification, which is very limited in many applications. Kernels have been used to learn powerful non-linear feature transformations, but these methods fail to scale to large datasets. In this paper, we present a scalable non-linear feature mapping method based on a deep neural network pretrained with restricted boltzmann machines for improving kNN classification in a large-margin framework, which we call DNet-kNN. DNet-kNN can be used for both classification and for supervised dimensionality reduction. The experimental results on two benchmark handwritten digit datasets show that DNet-kNN has much better performance than large-margin kNN using a linear mapping and kNN based on a deep autoencoder pretrained with retricted boltzmann machines.
Sparse Feature Learning for Deep Belief Networks
Ranzato, Marc', aurelio, Boureau, Y-lan, Cun, Yann L.
Unsupervised learning algorithms aim to discover the structure hidden in the data, and to learn representations that are more suitable as input to a supervised machine than the raw input. Many unsupervised methods are based on reconstructing the input from the representation, while constraining the representation to have certain desirable properties (e.g. low dimension, sparsity, etc). Others are based on approximating density by stochastically reconstructing the input from the representation. We describe a novel and efficient algorithm to learn sparse representations, and compare it theoretically and experimentally with a similar machines trained probabilistically, namely a Restricted Boltzmann Machine. We propose a simple criterion to compare and select different unsupervised machines based on the trade-off between the reconstruction error and the information content of the representation. We demonstrate this method by extracting features from a dataset of handwritten numerals, and from a dataset of natural image patches. We show that by stacking multiple levels of such machines and by training sequentially, high-order dependencies between the input variables can be captured.
Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes
Hinton, Geoffrey E., Salakhutdinov, Ruslan R.
We show how to use unlabeled data and a deep belief net (DBN) to learn a good covariance kernel for a Gaussian process. We first learn a deep generative model of the unlabeled data using the fast, greedy algorithm introduced by Hinton et.al. If the data is high-dimensional and highly-structured, a Gaussian kernel applied to the top layer of features in the DBN works much better than a similar kernel applied to the raw input. Performance at both regression and classification can then be further improved by using backpropagation through the DBN to discriminatively fine-tune the covariance kernel.
Sparse deep belief net model for visual area V2
Lee, Honglak, Ekanadham, Chaitanya, Ng, Andrew Y.
Motivated in part by the hierarchical organization of cortex, a number of algorithms have recently been proposed that try to learn hierarchical, or ``deep,'' structure from unlabeled data. While several authors have formally or informally compared their algorithms to computations performed in visual area V1 (and the cochlea), little attempt has been made thus far to evaluate these algorithms in terms of their fidelity for mimicking computations at deeper levels in the cortical hierarchy. This paper presents an unsupervised learning model that faithfully mimics certain properties of visual area V2. Specifically, we develop a sparse variant of the deep belief networks of Hinton et al. (2006). We learn two layers of nodes in the network, and demonstrate that the first layer, similar to prior work on sparse coding and ICA, results in localized, oriented, edge filters, similar to the Gabor functions known to model V1 cell receptive fields. Further, the second layer in our model encodes correlations of the first layer responses in the data. Specifically, it picks up both collinear (``contour'') features as well as corners and junctions. More interestingly, in a quantitative comparison, the encoding of these more complex ``corner'' features matches well with the results from the Ito & Komatsu's study of biological V2 responses. This suggests that our sparse variant of deep belief networks holds promise for modeling more higher-order features.