AITopics | Deep Learning

Collaborating Authors

Deep Learning

New computational algorithms make it possible to build neural networks with many input nodes and many layers, and distinguish "deep learning" of these networks from previous work on artificial neural nets.

News Overviews Instructional Materials AI-Alerts Classics

In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning

Neyshabur, Behnam, Tomioka, Ryota, Srebro, Nathan

arXiv.org Artificial IntelligenceApr-16-2015

We present experiments demonstrating that some other form of capacity control, different from network size, plays a central role in learning multi-layer feedforward networks. We argue, partially through analogy to matrix factorization, that this is an inductive bias that can help shed light on deep learning.

artificial intelligence, machine learning, regularization, (18 more...)

arXiv.org Artificial Intelligence

1412.6614

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Random Forests Can Hash

Qiu, Qiang, Sapiro, Guillermo, Bronstein, Alex

arXiv.org Machine LearningApr-16-2015

Hash codes are a very efficient data representation needed to be able to cope with the ever growing amounts of data. We introduce a random forest semantic hashing scheme with information-theoretic code aggregation, showing for the first time how random forest, a technique that together with deep learning have shown spectacular results in classification, can also be extended to large-scale retrieval. Traditional random forest fails to enforce the consistency of hashes generated from each tree for the same class data, i.e., to preserve the underlying similarity, and it also lacks a principled way for code aggregation across trees. We start with a simple hashing scheme, where independently trained random trees in a forest are acting as hashing functions. We the propose a subspace model as the splitting function, and show that it enforces the hash consistency in a tree for data from the same class. We also introduce an information-theoretic approach for aggregating codes of individual trees into a single hash code, producing a near-optimal unique hash for each class. Experiments on large-scale public datasets are presented, showing that the proposed approach significantly outperforms state-of-the-art hashing methods for retrieval tasks.

artificial intelligence, machine learning, random forest, (16 more...)

arXiv.org Machine Learning

1412.5083

Country: North America > Canada (0.15)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

High-performance Kernel Machines with Implicit Distributed Optimization and Randomization

Sindhwani, Vikas, Avron, Haim

arXiv.org Machine LearningApr-16-2015

In order to fully utilize "big data", it is often required to use "big models". Such models tend to grow with the complexity and size of the training data, and do not make strong parametric assumptions upfront on the nature of the underlying statistical dependencies. Kernel methods fit this need well, as they constitute a versatile and principled statistical methodology for solving a wide range of non-parametric modelling problems. However, their high computational costs (in storage and time) pose a significant barrier to their widespread adoption in big data applications. We propose an algorithmic framework and high-performance implementation for massive-scale training of kernel-based statistical models, based on combining two key technical ingredients: (i) distributed general purpose convex optimization, and (ii) the use of randomization to improve the scalability of kernel methods. Our approach is based on a block-splitting variant of the Alternating Directions Method of Multipliers, carefully reconfigured to handle very large random feature matrices, while exploiting hybrid parallelism typically found in modern clusters of multicore machines. Our implementation supports a variety of statistical learning tasks by enabling several loss functions, regularization schemes, kernels, and layers of randomized approximations for both dense and sparse datasets, in a highly extensible framework. We evaluate the ability of our framework to learn models on data from applications, and provide a comparison against existing sequential and parallel libraries.

algorithm, artificial intelligence, machine learning, (20 more...)

arXiv.org Machine Learning

1409.094

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

A Generative Model for Deep Convolutional Learning

Pu, Yunchen, Yuan, Xin, Carin, Lawrence

arXiv.org Machine LearningApr-15-2015

A generative model is developed for deep (multi-layered) convolutional dictionary learning. A novel probabilistic pooling operation is integrated into the deep model, yielding efficient bottom-up (pretraining) and top-down (refinement) probabilistic learning. Experimental results demonstrate powerful capabilities of the model to learn multi-layer features from images, and excellent classification results are obtained on the MNIST and Caltech 101 datasets.

machine learning, natural language, pixel, (15 more...)

arXiv.org Machine Learning

1504.04054

Country: North America > United States (0.15)

Genre: Research Report > New Finding (0.36)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Deep Narrow Boltzmann Machines are Universal Approximators

Montufar, Guido

arXiv.org Machine LearningApr-10-2015

We show that deep narrow Boltzmann machines are universal approximators of probability distributions on the activities of their visible units, provided they have sufficiently many hidden layers, each containing the same number of units as the visible layer. We show that, within certain parameter domains, deep Boltzmann machines can be studied as feedforward networks. We provide upper and lower bounds on the sufficient depth and width of universal approximators. These results settle various intuitions regarding undirected networks and, in particular, they show that deep narrow Boltzmann machines are at least as compact universal approximators as narrow sigmoid belief networks and restricted Boltzmann machines, with respect to the currently available bounds for those models.

artificial intelligence, machine learning, probability distribution, (17 more...)

arXiv.org Machine Learning

1411.3784

Country: North America > United States (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Scheduled denoising autoencoders

Geras, Krzysztof J., Sutton, Charles

arXiv.org Machine LearningApr-10-2015

We present a representation learning method that learns features at multiple different levels of scale. Working within the unsupervised framework of denoising autoencoders, we observe that when the input is heavily corrupted during training, the network tends to learn coarse-grained features, whereas when the input is only slightly corrupted, the network tends to learn fine-grained features. This motivates the scheduled denoising autoencoder, which starts with a high level of noise that lowers as training progresses. We find that the resulting representation yields a significant boost on a later supervised task compared to the original input, or to a standard denoising autoencoder trained at a single noise level. After supervised fine-tuning our best model achieves the lowest ever reported error on the CIFAR-10 data set among permutation-invariant methods.

artificial intelligence, machine learning, noise level, (18 more...)

arXiv.org Machine Learning

1406.3269

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Zero-bias autoencoders and the benefits of co-adapting features

Konda, Kishore, Memisevic, Roland, Krueger, David

arXiv.org Machine LearningApr-8-2015

Regularized training of an autoencoder typically results in hidden unit biases that take on large negative values. We show that negative biases are a natural result of using a hidden layer whose responsibility is to both represent the input data and act as a selection mechanism that ensures sparsity of the representation. We then show that negative biases impede the learning of data distributions whose intrinsic dimensionality is high. We also propose a new activation function that decouples the two roles of the hidden layer and that allows us to learn representations on data with very high intrinsic dimensionality, where standard autoencoders typically fail. Since the decoupled activation function acts like an implicit regularizer, the model can be trained by minimizing the reconstruction error of training data, without requiring any additional regularization.

artificial intelligence, autoencoder, machine learning, (16 more...)

arXiv.org Machine Learning

1402.3337

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Transferring Knowledge from a RNN to a DNN

Chan, William, Ke, Nan Rosemary, Lane, Ian

arXiv.org Machine LearningApr-7-2015

Deep Neural Network (DNN) acoustic models have yielded many state-of-the-art results in Automatic Speech Recognition (ASR) tasks. More recently, Recurrent Neural Network (RNN) models have been shown to outperform DNNs counterparts. However, state-of-the-art DNN and RNN models tend to be impractical to deploy on embedded systems with limited computational capacity. Traditionally, the approach for embedded platforms is to either train a small DNN directly, or to train a small DNN that learns the output distribution of a large DNN. In this paper, we utilize a state-of-the-art RNN to transfer knowledge to small DNN. We use the RNN model to generate soft alignments and minimize the Kullback-Leibler divergence against the small DNN. The small DNN trained on the soft RNN alignments achieved a 3.93 WER on the Wall Street Journal (WSJ) eval92 task compared to a baseline 4.54 WER or more than 13% relative improvement.

alignment, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

1504.01483

Genre: Research Report (0.50)

Industry: Media > News (0.35)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Explorations on high dimensional landscapes

Sagun, Levent, Guney, V. Ugur, Arous, Gerard Ben, LeCun, Yann

arXiv.org Machine LearningApr-6-2015

Finding minima of a real valued non-convex function over a high dimensional space is a major challenge in science. We provide evidence that some such functions that are defined on high dimensional domains have a narrow band of values whose pre-image contains the bulk of its critical points. This is in contrast with the low dimensional picture in which this band is wide. Our simulations agree with the previous theoretical work on spin glasses that proves the existence of such a band when the dimension of the domain tends to infinity. Furthermore our experiments on teacher-student networks with the MNIST dataset establish a similar phenomenon in deep networks. We finally observe that both the gradient descent and the stochastic gradient descent methods can reach this level within the same number of steps.

artificial intelligence, critical point, machine learning, (19 more...)

arXiv.org Machine Learning

1412.6615

Country: North America > United States > New York > New York County > New York City (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

A Unified Deep Neural Network for Speaker and Language Recognition

Richardson, Fred, Reynolds, Douglas, Dehak, Najim

arXiv.org Machine LearningApr-3-2015

ABSTRACT Learned feature representations and sub-phoneme posteriors from Deep Neural Networks (DNNs) have been used separately to produce significant performance gains for speaker and language recognition tasks. In this work we show how these gains are possible using a single DNN for both speaker and language recognition. The unified DNN approach is shown to yield substantial performance improvements on the the 2013 Domain Adaptation Challenge speaker recognition task (55% reduction in EER for the out-of-domain condition) and on the NIST 2011 Language Recognition Evaluation (48% reduction in EER for the 30s test condition). Index Terms: i-vector, DNN, bottleneck features, speaker recognition, language recognition 1. INTRODUCTION The impressive gains in performance obtained using deep neural networks (DNNs) for automatic speech recognition (ASR) [1] have motivated the application of DNNs to other speech technologies such as speaker recognition (SR) and language recognition (LR) [2, 3, 4, 5, 6, 7, 8, 9]. Two general methods of applying DNN's to the SR and LR tasks have been shown to be effective.

artificial intelligence, bottleneck feature, machine learning, (19 more...)

arXiv.org Machine Learning

1504.00923

Country: North America > United States (0.94)

Genre: Research Report (0.64)

Industry: Government > Regional Government (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback