AITopics | Deep Learning

Collaborating Authors

Deep Learning

New computational algorithms make it possible to build neural networks with many input nodes and many layers, and distinguish "deep learning" of these networks from previous work on artificial neural nets.

News Overviews Instructional Materials AI-Alerts Classics

Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition

Sak, Haşim, Senior, Andrew, Beaufays, Françoise

arXiv.org Machine LearningFeb-5-2014

Long Short-Term Memory (LSTM) is a recurrent neural network (RNN) architecture that has been designed to address the vanishing and exploding gradient problems of conventional RNNs. Unlike feedforward neural networks, RNNs have cyclic connections making them powerful for modeling sequences. They have been successfully used for sequence labeling and sequence prediction tasks, such as handwriting recognition, language modeling, phonetic labeling of acoustic frames. However, in contrast to the deep neural networks, the use of RNNs in speech recognition has been limited to phone recognition in small scale tasks. In this paper, we present novel LSTM based RNN architectures which make more effective use of model parameters to train acoustic models for large vocabulary speech recognition. We train and compare LSTM, RNN and DNN models at various numbers of parameters and configurations. We show that LSTM models converge quickly and give state of the art speech recognition performance for relatively small sized models.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Machine Learning

1402.1128

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learning Ordered Representations with Nested Dropout

Rippel, Oren, Gelbart, Michael A., Adams, Ryan P.

arXiv.org Machine LearningFeb-4-2014

In this paper, we study ordered representations of data in which different dimensions have different degrees of importance. To learn these representations we introduce nested dropout, a procedure for stochastically removing coherent nested sets of hidden units in a neural network. We first present a sequence of theoretical results in the simple case of a semi-linear autoencoder. We rigorously show that the application of nested dropout enforces identifiability of the units, which leads to an exact equivalence with PCA. We then extend the algorithm to deep models and demonstrate the relevance of ordered representations to a number of applications. Specifically, we use the ordered property of the learned codes to construct hash-based data structures that permit very fast retrieval, achieving retrieval in time logarithmic in the database size and independent of the dimensionality of the representation. This allows codes that are hundreds of times longer than currently feasible for retrieval. We therefore avoid the diminished quality associated with short codes, while still performing retrieval that is competitive in speed with existing methods. We also show that ordered representations are a promising way to learn adaptive compression for efficient online data reconstruction.

artificial intelligence, machine learning, representation, (13 more...)

arXiv.org Machine Learning

1402.0915

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Universal Approximation Depth and Errors of Narrow Belief Networks with Discrete Units

Montúfar, Guido F.

arXiv.org Machine LearningJan-28-2014

A deep belief network (DBN) (Hinton et al., 2006) is a layered stochastic network with undirected bipartite interactions between the units in the top two layers, and directed bipartite interactions between the units in all other subsequent pairs of layers, directed towards the bottom layer. The top two layers form a restricted Boltzmann machine (RBM) (Smolensky, 1986). The entire network defines a model of probability distributions on the states of the units in the bottom layer, the visible layer. When the number of units in every layer has the same order of magnitude, the network is called narrow . The depth refers to the number of layers. Deep network architectures are believed to play a key role in information processing of intelligent agents, see (Bengio, 2009) for an overview on this exciting topic. DBNs were the first deep architectures to be envisaged together with an efficient unsupervised training algorithm (Hinton et al., 2006).

artificial intelligence, machine learning, probability distribution, (17 more...)

arXiv.org Machine Learning

1303.7461

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.71)

Add feedback

Multimodal Transitions for Generative Stochastic Networks

Ozair, Sherjil, Yao, Li, Bengio, Yoshua

arXiv.org Machine LearningJan-24-2014

Generative Stochastic Networks (GSNs) have been recently introduced as an alternative to traditional probabilistic modeling: instead of parametrizing the data distribution directly, one parametrizes a transition operator for a Markov chain whose stationary distribution is an estimator of the data generating distribution. The result of training is therefore a machine that generates samples through this Markov chain. However, the previously introduced GSN consistency theorems suggest that in order to capture a wide class of distributions, the transition operator in general should be multimodal, something that has not been done before this paper. We introduce for the first time multimodal transition distributions for GSNs, in particular using models in the NADE family (Neural Autoregressive Density Estimator) as output distributions of the transition operator. A NADE model is related to an RBM (and can thus model multimodal distributions) but its likelihood (and likelihood gradient) can be computed easily. The parameters of the NADE are obtained as a learned function of the previous state of the learned Markov chain. Experiments clearly illustrate the advantage of such multimodal transition distributions over unimodal GSNs.

artificial intelligence, bengio, machine learning, (16 more...)

arXiv.org Machine Learning

1312.5578

Country: North America > Canada > Quebec (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.78)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)

Add feedback

Gaussian-binary Restricted Boltzmann Machines on Modeling Natural Image Statistics

Wang, Nan, Melchior, Jan, Wiskott, Laurenz

arXiv.org Machine LearningJan-23-2014

We present a theoretical analysis of Gaussian-binary restricted Boltzmann machines (GRBMs) from the perspective of density models. The key aspect of this analysis is to show that GRBMs can be formulated as a constrained mixture of Gaussians, which gives a much better insight into the model's capabilities and limitations. We show that GRBMs are capable of learning meaningful features both in a two-dimensional blind source separation task and in modeling natural images. Further, we show that reported difficulties in training GRBMs are due to the failure of the training algorithm rather than the model itself. Based on our analysis we are able to propose several training recipes, which allowed successful and fast training in our experiments. Finally, we discuss the relationship of GRBMs to several modifications that have been proposed to improve the model.

artificial intelligence, grbm, machine learning, (19 more...)

arXiv.org Machine Learning

doi: 10.1371/journal.pone.0171015

1401.59

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.61)

Add feedback

Compositional Operators in Distributional Semantics

Kartsaklis, Dimitri

arXiv.org Artificial IntelligenceJan-21-2014

The recent developments on the syntactical and morphological analysis of natural language text constitute the first step towards a more ambitious goal, that of assigning a proper form of meaning to arbitrary text compounds. Indeed, for certain really "intelligent" applications, such as machine translation, question-answering systems, paraphrase detection, or automatic essay scoring, to name just a few, there will always exist a gap between raw linguistic information (such as part-of-speech labels, for example) and the knowledge of the real world that is needed for the completion of the task in a satisfactory way. Semantic analysis has exactly this role, aiming to close (or reduce as much as possible) this gap by linking the linguistic information with semantic representations that embody this elusive real-world knowledge. The traditional way of adding semantics to sentences is a syntax-driven compositional approach: every word in the sentence is associated with a primitive symbol or a predicate, and these are combined to larger and larger logical forms based on the syntactical rules of the grammar. At the end of the syntactical analysis, the logical representation of the whole sentence is a complex formula that can be fed to a theorem prover for further processing. Although such an approach seems intuitive, it has been shown that it is rather inefficient for any practical application (for example, Bos and Markert (2006) get very low recall scores for a textual entailment task).

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s40362-014-0017-z

1401.5327

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
(9 more...)

Genre:

Overview (1.00)
Research Report (0.63)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Approximated Infomax Early Stopping: Revisiting Gaussian RBMs on Natural Images

Kiwaki, Taichi, Makino, Takaki, Aihara, Kazuyuki

arXiv.org Machine LearningJan-6-2014

We pursue an early stopping technique that helps Gaussian Restricted Boltzmann Machines (GRBMs) to gain good natural image representations in terms of overcompleteness and data fitting. GRBMs are widely considered as an unsuitable model for natural images because they gain non-overcomplete representations which include uniform filters that do not represent useful image features. We have recently found that GRBMs once gain and subsequently lose useful filters during their training, contrary to this common perspective. We attribute this phenomenon to a tradeoff between overcompleteness of GRBM representations and data fitting. To gain GRBM representations that are overcomplete and fit data well, we propose a measure for GRBM representation quality, approximated mutual information, and an early stopping technique based on this measure. The proposed method boosts performance of classifiers trained on GRBM representations.

artificial intelligence, machine learning, representation, (15 more...)

arXiv.org Machine Learning

1312.5412

Country:

Asia > Japan (0.15)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.36)

Add feedback

An empirical analysis of dropout in piecewise linear networks

Warde-Farley, David, Goodfellow, Ian J., Courville, Aaron, Bengio, Yoshua

arXiv.org Machine LearningJan-2-2014

The recently introduced dropout training criterion for neural networks has been the subject of much attention due to its simplicity and remarkable effectiveness as a regularizer, as well as its interpretation as a training procedure for an exponentially large ensemble of networks that share parameters. In this work we empirically investigate several questions related to the efficacy of dropout, specifically as it concerns networks employing the popular rectified linear activation function. We investigate the quality of the test time weight-scaling inference procedure by evaluating the geometric average exactly in small models, as well as compare the performance of the geometric mean to the arithmetic mean more commonly employed by ensemble techniques. We explore the effect of tied weights on the ensemble interpretation by training ensembles of masked networks without tied weights. Finally, we investigate an alternative criterion based on a biased estimator of the maximum likelihood ensemble gradient.

dropout, ensemble, geometric mean, (15 more...)

arXiv.org Machine Learning

1312.6197

Country:

North America > Canada > Quebec > Montreal (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report > New Finding (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Learning Deep Representation Without Parameter Inference for Nonlinear Dimensionality Reduction

Zhang, Xiao-Lei

arXiv.org Machine LearningJan-2-2014

Unsupervised deep learning is one of the most powerful representation learning techniques. Restricted Boltzman machine, sparse coding, regularized auto-encoders, and convolutional neural networks are pioneering building blocks of deep learning. In this paper, we propose a new building block -- distributed random models. The proposed method is a special full implementation of the product of experts: (i) each expert owns multiple hidden units and different experts have different numbers of hidden units; (ii) the model of each expert is a k-center clustering, whose k-centers are only uniformly sampled examples, and whose output (i.e. the hidden units) is a sparse code that only the similarity values from a few nearest neighbors are reserved. The relationship between the pioneering building blocks, several notable research branches and the proposed method is analyzed. Experimental results show that the proposed deep model can learn better representations than deep belief networks and meanwhile can train a much larger network with much less time than deep belief networks.

learning deep representation, nonlinear dimensionality reduction, parameter inference

arXiv.org Machine Learning

1308.4922

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Relaxations for inference in restricted Boltzmann machines

Wang, Sida I., Frostig, Roy, Liang, Percy, Manning, Christopher D.

arXiv.org Machine LearningJan-2-2014

We propose a relaxation-based approximate inference algorithm that samples near-MAP configurations of a binary pairwise Markov random field. We experiment on MAP inference tasks in several restricted Boltzmann machines. We also use our underlying sampler to estimate the log-partition function of restricted Boltzmann machines and compare against other sampling-based methods.

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Machine Learning

1312.6205

Country:

North America > United States > California > Santa Clara County (0.15)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback