AITopics | Deep Learning

Collaborating Authors

Deep Learning

New computational algorithms make it possible to build neural networks with many input nodes and many layers, and distinguish "deep learning" of these networks from previous work on artificial neural nets.

News Overviews Instructional Materials AI-Alerts Classics

On Fast Dropout and its Applicability to Recurrent Networks

Bayer, Justin, Osendorfer, Christian, Korhammer, Daniela, Chen, Nutan, Urban, Sebastian, van der Smagt, Patrick

arXiv.org Machine LearningMar-5-2014

Recurrent Neural Networks (RNNs) are rich models for the processing of sequential data. Recent work on advancing the state of the art has been focused on the optimization or modelling of RNNs, mostly motivated by adressing the problems of the vanishing and exploding gradients. The control of overfitting has seen considerably less attention. This paper contributes to that by analyzing fast dropout, a recent regularization method for generalized linear models and neural networks from a back-propagation inspired perspective. We show that fast dropout implements a quadratic form of an adaptive, per-parameter regularizer, which rewards large weights in the light of underfitting, penalizes them for overconfident predictions and vanishes at minima of an unregularized training loss. The derivatives of that regularizer are exclusively based on the training error signal. One consequence of this is the absence of a global weight attractor, which is particularly appealing for RNNs, since the dynamics are not biased towards a certain regime. We positively test the hypothesis that this improves the performance of RNNs on four musical data sets. 1 Introduction Recurrent Neural Networks are among the most powerful models for sequential data. The capa-bilty of representing any measurable sequence to sequence mapping to arbitrary accuracy (Hammer, 2000) makes them universal approximators. Nevertheless they were given only little attention in the last two decades due to the problems of vanishing and exploding gradients (Hochreiter, 1991; Bengio et al., 1994; Pascanu et al., 2012).

artificial intelligence, dropout, machine learning, (17 more...)

arXiv.org Machine Learning

1311.0701

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Le Cam meets LeCun: Deficiency and Generic Feature Learning

van Rooyen, Brendan, Williamson, Robert C.

arXiv.org Machine LearningFeb-21-2014

"Deep Learning" methods attempt to learn generic features in an unsupervised fashion from a large unlabelled data set. These generic features should perform as well as the best hand crafted features for any learning problem that makes use of this data. We provide a definition of generic features, characterize when it is possible to learn them and provide algorithms closely related to the deep belief network and autoencoders of deep learning. In order to do so we use the notion of deficiency distance and illustrate its value in studying certain general learning problems.

artificial intelligence, experiment, machine learning, (17 more...)

arXiv.org Machine Learning

1402.4884

Genre: Research Report (0.40)

Industry: Education > Focused Education > Special Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)

Add feedback

An Algorithm for Training Polynomial Networks

Livni, Roi, Shalev-Shwartz, Shai, Shamir, Ohad

arXiv.org Artificial IntelligenceFeb-20-2014

We consider deep neural networks, in which the output of each node is a quadratic function of its inputs. Similar to other deep architectures, these networks can compactly represent any function on a finite training set. The main goal of this paper is the derivation of an efficient layer-by-layer algorithm for training such networks, which we denote as the \emph{Basis Learner}. The algorithm is a universal learner in the sense that the training error is guaranteed to decrease at every iteration, and can eventually reach zero under mild conditions. We present practical implementations of this algorithm, as well as preliminary experimental results. We also compare our deep architecture to other shallow architectures for learning polynomials, in particular kernel learning.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

1304.7045

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Exact solutions to the nonlinear dynamics of learning in deep linear neural networks

Saxe, Andrew M., McClelland, James L., Ganguli, Surya

arXiv.org Machine LearningFeb-19-2014

Despite the widespread practical success of deep learning methods, our theoretical understanding of the dynamics of learning in deep neural networks remains quite sparse. We attempt to bridge the gap between the theory and practice of deep learning by systematically analyzing learning dynamics for the restricted case of deep linear neural networks. Despite the linearity of their input-output map, such networks have nonlinear gradient descent dynamics on weights that change with the addition of each new hidden layer. We show that deep linear networks exhibit nonlinear learning phenomena similar to those seen in simulations of nonlinear networks, including long plateaus followed by rapid transitions to lower error solutions, and faster convergence from greedy unsupervised pretraining initial conditions than from random initial conditions. We provide an analytical description of these phenomena by finding new exact solutions to the nonlinear dynamics of deep learning. Our theoretical analysis also reveals the surprising finding that as the depth of a network approaches infinity, learning speed can nevertheless remain finite: for a special class of initial conditions on the weights, very deep networks incur only a finite, depth independent, delay in learning speed relative to shallow networks. We show that, under certain conditions on the training data, unsupervised pretraining can find this special class of initial conditions, while scaled random Gaussian initializations cannot. We further exhibit a new class of random orthogonal initial conditions on weights that, like unsupervised pre-training, enjoys depth independent learning times. We further show that these initial conditions also lead to faithful propagation of gradients even in deep nonlinear networks, as long as they operate in a special regime known as the edge of chaos.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Machine Learning

1312.612

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Improving Deep Neural Networks with Probabilistic Maxout Units

Springenberg, Jost Tobias, Riedmiller, Martin

arXiv.org Machine LearningFeb-19-2014

We present a probabilistic variant of the recently introduced maxout unit. The success of deep neural networks utilizing maxout can partly be attributed to favorable performance under dropout, when compared to rectified linear units. It however also depends on the fact that each maxout unit performs a pooling operation over a group of linear transformations and is thus partially invariant to changes in its input. Starting from this observation we ask the question: Can the desirable properties of maxout units be preserved while improving their invariance properties ? We argue that our probabilistic maxout (probout) units successfully achieve this balance. We quantitatively verify this claim and report classification performance matching or exceeding the current state of the art on three challenging image classification benchmarks (CIFAR-10, CIFAR-100 and SVHN).

artificial intelligence, machine learning, probout unit, (19 more...)

arXiv.org Machine Learning

1312.6116

Country: Europe > Germany (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Deep learning for neuroimaging: a validation study

Plis, Sergey M., Hjelm, Devon R., Salakhutdinov, Ruslan, Calhoun, Vince D.

arXiv.org Machine LearningFeb-19-2014

Vince D. Calhoun The Mind Research Network Albuquerque, NM 87106 vcalhoun@mrn.org Deep learning methods have recently made notable advances in the tasks of classification and representation learning. These tasks are important for brain imaging and neuroscience discovery, making the methods attractive for porting to a neuroimager's toolbox. Success of these methods is, in part, explained by the flexibility of deep learning models. However, this flexibility makes the process of porting to new areas a difficult parameter optimization problem. In this work we demonstrate our results (and feasible parameter ranges) in application of deep learning methods to structural and functional brain imaging data. We also describe a novel constraint-based approach to visualizing high dimensional data. We use it to analyze the effect of parameter choices on data transformations. Our results show that deep learning methods are able to learn physiologically important representations and detect latent relations in neuroimaging data.

artificial intelligence, constraint, machine learning, (17 more...)

arXiv.org Machine Learning

1312.5847

Country:

North America > United States > New Mexico > Bernalillo County > Albuquerque (0.24)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Correlation-based construction of neighborhood and edge features

Kégl, Balázs

arXiv.org Machine LearningFeb-16-2014

Motivated by an abstract notion of low-level edge detector filters, we propose a simple method of unsupervised feature construction based on pairwise statistics of features. In the first step, we construct neighborhoods of features by regrouping features that correlate. Then we use these subsets as filters to produce new neighborhood features. Next, we connect neighborhood features that correlate, and construct edge features by subtracting the correlated neighborhood features of each other. To validate the usefulness of the constructed features, we ran AdaBoost.MH on four multi-class classification problems. Our most significant result is a test error of 0.94% on MNIST with an algorithm which is essentially free of any image-specific priors. On CIFAR-10 our method is suboptimal compared to today's best deep learning techniques, nevertheless, we show that the proposed method outperforms not only boosting on the raw pixels, but also boosting on Haar filters.

artificial intelligence, edge feature, machine learning, (19 more...)

arXiv.org Machine Learning

1312.7335

Country: North America > Canada (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Squeezing bottlenecks: exploring the limits of autoencoder semantic representation capabilities

Gupta, Parth, Banchs, Rafael E., Rosso, Paolo

arXiv.org Machine LearningFeb-13-2014

We present a comprehensive study on the use of autoencoders for modelling text data, in which (differently from previous studies) we focus our attention on the following issues: i) we explore the suitability of two different models bDA and rsDA for constructing deep autoencoders for text data at the sentence level; ii) we propose and evaluate two novel metrics for better assessing the text-reconstruction capabilities of autoencoders; and iii) we propose an automatic method to find the critical bottleneck dimensionality for text language representations (below which structural information is lost).

autoencoder, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

1402.307

Country: North America > United States (0.47)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Learning to encode motion using spatio-temporal synchrony

Konda, Kishore Reddy, Memisevic, Roland, Michalski, Vincent

arXiv.org Machine LearningFeb-10-2014

We consider the task of learning to extract motion from videos. To this end, we show that the detection of spatial transformations can be viewed as the detection of synchrony between the image sequence and a sequence of features undergoing the motion we wish to detect. We show that learning about synchrony is possible using very fast, local learning rules, by introducing multiplicative "gating" interactions between hidden units across frames. This makes it possible to achieve competitive performance in a wide variety of motion estimation tasks, using a small fraction of the time required to learn features, and to outperform hand-crafted spatio-temporal features by a large margin. We also show how learning about synchrony can be viewed as performing greedy parameter estimation in the well-known motion energy model.

artificial intelligence, machine learning, synchrony, (18 more...)

arXiv.org Machine Learning

1306.3162

Country: Europe (0.28)

Genre: Research Report (0.64)

Industry: Education (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Modeling sequential data using higher-order relational features and predictive training

Michalski, Vincent, Memisevic, Roland, Konda, Kishore

arXiv.org Machine LearningFeb-10-2014

Bi-linear feature learning models, like the gated autoencoder, were proposed as a way to model relationships between frames in a video. By minimizing reconstruction error of one frame, given the previous frame, these models learn "mapping units" that encode the transformations inherent in a sequence, and thereby learn to encode motion. In this work we extend bi-linear models by introducing "higher-order mapping units" that allow us to encode transformations between frames and transformations between transformations. We show that this makes it possible to encode temporal structure that is more complex and longer-range than the structure captured within standard bi-linear models. We also show that a natural way to train the model is by replacing the commonly used reconstruction objective with a prediction objective which forces the model to correctly predict the evolution of the input multiple steps into the future. Learning can be achieved by back-propagating the multi-step prediction through time. We test the model on various temporal prediction tasks, and show that higher-order mappings and predictive training both yield a significant improvement over bi-linear models in terms of prediction accuracy.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Machine Learning

1402.2333

Country: Europe > Germany (0.28)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback