AITopics | Deep Learning

Collaborating Authors

Deep Learning

New computational algorithms make it possible to build neural networks with many input nodes and many layers, and distinguish "deep learning" of these networks from previous work on artificial neural nets.

News Overviews Instructional Materials AI-Alerts Classics

Knowledge Transfer Pre-training

Tang, Zhiyuan, Wang, Dong, Pan, Yiqiao, Zhang, Zhiyong

arXiv.org Machine LearningJun-7-2015

Pre-training is crucial for learning deep neural networks. Most of existing pre-training methods train simple models (e.g., restricted Boltzmann machines) and then stack them layer by layer to form the deep structure. This layer-wise pre-training has found strong theoretical foundation and broad empirical support. However, it is not easy to employ such method to pre-train models without a clear multi-layer structure,e.g., recurrent neural networks (RNNs). This paper presents a new pre-training approach based on knowledge transfer learning. In contrast to the layer-wise approach which trains model components incrementally, the new approach trains the entire model as a whole but with an easier objective function. This is achieved by utilizing soft targets produced by a prior trained model (teacher model). Compared to the conventional layer-wise methods, this new method does not care about the model structure, so can be used to pre-train very complex models. Experiments on a speech recognition task demonstrated that with this approach, complex RNNs can be well trained with a weaker deep neural network (DNN) model. Furthermore, the new method can be combined with conventional layer-wise pre-training to deliver additional gains.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Machine Learning

1506.02256

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MADE: Masked Autoencoder for Distribution Estimation

Germain, Mathieu, Gregor, Karol, Murray, Iain, Larochelle, Hugo

arXiv.org Machine LearningJun-5-2015

There has been a lot of recent interest in designing neural network models to estimate a distribution from a set of examples. We introduce a simple modification for autoencoder neural networks that yields powerful generative models. Our method masks the autoencoder's parameters to respect autoregressive constraints: each input is reconstructed only from previous inputs in a given ordering. Constrained this way, the autoencoder outputs can be interpreted as a set of conditional probabilities, and their product, the full joint probability. We can also train a single network that can decompose the joint probability in multiple different orderings. Our simple framework can be applied to multiple architectures, including deep ones. Vectorized implementations, such as on GPUs, are simple and fast. Experiments demonstrate that this approach is competitive with state-of-the-art tractable distribution estimators. At test time, the method is significantly faster and scales better than other autoregressive estimators.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Machine Learning

1502.03509

Country:

North America > Canada (0.29)
North America > United States > California (0.28)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

On the Stability of Deep Networks

Giryes, Raja, Sapiro, Guillermo, Bronstein, Alex M.

arXiv.org Machine LearningJun-3-2015

In this work we study the properties of deep neural networks (DNN) with random weights. We formally prove that these networks perform a distance-preserving embedding of the data. Based on this we then draw conclusions on the size of the training data and the networks' structure. A longer version of this paper with more results and details can be found in (Giryes et al., 2015). In particular, we formally prove in the longer version that DNN with random Gaussian weights perform a distance-preserving embedding of the data, with a special treatment for in-class and out-of-class data.

artificial intelligence, machine learning, workshop contribution, (18 more...)

arXiv.org Machine Learning

1412.5896

Genre: Research Report (0.65)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Blocks and Fuel: Frameworks for deep learning

van Merriënboer, Bart, Bahdanau, Dzmitry, Dumoulin, Vincent, Serdyuk, Dmitriy, Warde-Farley, David, Chorowski, Jan, Bengio, Yoshua

arXiv.org Machine LearningJun-1-2015

We introduce two Python frameworks to train neural networks on large datasets: Blocks and Fuel. Blocks is based on Theano, a linear algebra compiler with CUDA-support. It facilitates the training of complex neural network models by providing parametrized Theano operations, attaching metadata to Theano's symbolic computational graph, and providing an extensive set of utilities to assist training the networks, e.g. training algorithms, logging, monitoring, visualization, and serialization. Fuel provides a standard format for machine learning datasets. It allows the user to easily iterate over large datasets, performing many types of pre-processing on the fly.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Machine Learning

1506.00619

Country: North America > Canada > Quebec > Montreal (0.21)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Imaging Time-Series to Improve Classification and Imputation

Wang, Zhiguang, Oates, Tim

arXiv.org Machine LearningMay-31-2015

Inspired by recent successes of deep learning in computer vision, we propose a novel framework for encoding time series as different types of images, namely, Gramian Angular Summation/Difference Fields (GASF/GADF) and Markov Transition Fields (MTF). This enables the use of techniques from computer vision for time series classification and imputation. We used Tiled Convolutional Neural Networks (tiled CNNs) on 20 standard datasets to learn high-level features from the individual and compound GASF-GADF-MTF images. Our approaches achieve highly competitive results when compared to nine of the current best time series classification approaches. Inspired by the bijection property of GASF on 0/1 rescaled data, we train Denoised Auto-encoders (DA) on the GASF images of four standard and one synthesized compound dataset. The imputation MSE on test data is reduced by 12.18%-48.02% when compared to using the raw data. An analysis of the features and weights learned via tiled CNNs and DAs explains why the approaches work.

artificial intelligence, machine learning, time sery, (17 more...)

arXiv.org Machine Learning

1506.00327

Country: North America > United States > Maryland (0.28)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Linear Dynamical System Model for Text

Belanger, David, Kakade, Sham

arXiv.org Machine LearningMay-31-2015

Low dimensional representations of words allow accurate NLP models to be trained on limited annotated data. While most representations ignore words' local context, a natural way to induce context-dependent representations is to perform inference in a probabilistic latent-variable sequence model. Given the recent success of continuous vector space word representations, we provide such an inference procedure for continuous states, where words' representations are given by the posterior mean of a linear dynamical system. Here, efficient inference can be performed using Kalman filtering. Our learning algorithm is extremely scalable, operating on simple cooccurrence counts for both parameter initialization using the method of moments and subsequent iterations of EM. In our experiments, we employ our inferred word embeddings as features in standard tagging tasks, obtaining significant accuracy improvements. Finally, the Kalman filter updates can be seen as a linear recurrent neural network. We demonstrate that using the parameters of our model to initialize a non-linear recurrent neural network language model reduces its training time by a day and yields lower perplexity.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Machine Learning

1502.04081

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

Caffe con Troll: Shallow Ideas to Speed Up Deep Learning

Hadjis, Stefan, Abuzaid, Firas, Zhang, Ce, Ré, Christopher

arXiv.org Machine LearningMay-26-2015

We present Caffe con Troll (CcT), a fully compatible end-to-end version of the popular framework Caffe with rebuilt internals. We built CcT to examine the performance characteristics of training and deploying general-purpose convolutional neural networks across different hardware architectures. We find that, by employing standard batching optimizations for CPU training, we achieve a 4.5x throughput improvement over Caffe on popular networks like CaffeNet. Moreover, with these improvements, the end-to-end training time for CNNs is directly proportional to the FLOPS delivered by the CPU, which enables us to efficiently train hybrid CPU-GPU systems for CNNs.

artificial intelligence, caffe, machine learning, (15 more...)

arXiv.org Machine Learning

1504.04343

Country: North America > United States (1.00)

Genre: Research Report (0.50)

Industry:

Government > Regional Government > North America Government > United States Government (0.93)
Information Technology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Weight Uncertainty in Neural Networks

Blundell, Charles, Cornebise, Julien, Kavukcuoglu, Koray, Wierstra, Daan

arXiv.org Machine LearningMay-21-2015

We introduce a new, efficient, principled and backpropagation-compatible algorithm for learning a probability distribution on the weights of a neural network, called Bayes by Backprop. It regularises the weights by minimising a compression cost, known as the variational free energy or the expected lower bound on the marginal likelihood. We show that this principled kind of regularisation yields comparable performance to dropout on MNIST classification. We then demonstrate how the learnt uncertainty in the weights can be used to improve generalisation in non-linear regression problems, and how this weight uncertainty can be used to drive the exploration-exploitation trade-off in reinforcement learning.

deep learning, neural network, upstream oil & gas, (15 more...)

arXiv.org Machine Learning

1505.05424

Country:

Europe > France (0.28)
Asia (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.86)

Add feedback

Qualitatively characterizing neural network optimization problems

Goodfellow, Ian J., Vinyals, Oriol, Saxe, Andrew M.

arXiv.org Machine LearningMay-21-2015

Training neural networks involves solving large-scale non-convex optimization problems. This task has long been believed to be extremely difficult, with fear of local minima and other obstacles motivating a variety of schemes to improve optimization, such as unsupervised pretraining. However, modern neural networks are able to achieve negligible training error on complex tasks, using only direct training with stochastic gradient descent. We introduce a simple analysis technique to look for evidence that such networks are overcoming local optima. We find that, in fact, on a straight path from initialization to solution, a variety of state of the art neural networks never encounter any significant obstacles.

artificial intelligence, experiment, machine learning, (15 more...)

arXiv.org Machine Learning

1412.6544

Country:

North America > Canada (0.28)
North America > United States > California > Santa Clara County (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Markov Chain Monte Carlo and Variational Inference: Bridging the Gap

Salimans, Tim, Kingma, Diederik P., Welling, Max

arXiv.org Machine LearningMay-19-2015

Recent advances in stochastic gradient variational inference have made it possible to perform variational Bayesian inference with posterior approximations containing auxiliary random variables. This enables us to explore a new synthesis of variational inference and Monte Carlo methods where we incorporate one or more steps of MCMC into our variational approximation. By doing so we obtain a rich class of inference algorithms bridging the gap between variational methods and MCMC, and offering the best of both worlds: fast posterior approximation through the maximization of an explicit objective, with the option of trading off additional computation for additional accuracy. We describe the theoretical foundations that make this possible and show some promising first results.

approximation, artificial intelligence, machine learning, (12 more...)

arXiv.org Machine Learning

1410.646

Country:

Europe (0.28)
Asia (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.54)
(2 more...)

Add feedback