AITopics | Deep Learning

Collaborating Authors

Deep Learning

New computational algorithms make it possible to build neural networks with many input nodes and many layers, and distinguish "deep learning" of these networks from previous work on artificial neural nets.

News Overviews Instructional Materials AI-Alerts Classics

A Linear Dynamical System Model for Text

Belanger, David, Kakade, Sham

arXiv.org Machine LearningMay-31-2015

Low dimensional representations of words allow accurate NLP models to be trained on limited annotated data. While most representations ignore words' local context, a natural way to induce context-dependent representations is to perform inference in a probabilistic latent-variable sequence model. Given the recent success of continuous vector space word representations, we provide such an inference procedure for continuous states, where words' representations are given by the posterior mean of a linear dynamical system. Here, efficient inference can be performed using Kalman filtering. Our learning algorithm is extremely scalable, operating on simple cooccurrence counts for both parameter initialization using the method of moments and subsequent iterations of EM. In our experiments, we employ our inferred word embeddings as features in standard tagging tasks, obtaining significant accuracy improvements. Finally, the Kalman filter updates can be seen as a linear recurrent neural network. We demonstrate that using the parameters of our model to initialize a non-linear recurrent neural network language model reduces its training time by a day and yields lower perplexity.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Machine Learning

1502.04081

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

Caffe con Troll: Shallow Ideas to Speed Up Deep Learning

Hadjis, Stefan, Abuzaid, Firas, Zhang, Ce, Ré, Christopher

arXiv.org Machine LearningMay-26-2015

We present Caffe con Troll (CcT), a fully compatible end-to-end version of the popular framework Caffe with rebuilt internals. We built CcT to examine the performance characteristics of training and deploying general-purpose convolutional neural networks across different hardware architectures. We find that, by employing standard batching optimizations for CPU training, we achieve a 4.5x throughput improvement over Caffe on popular networks like CaffeNet. Moreover, with these improvements, the end-to-end training time for CNNs is directly proportional to the FLOPS delivered by the CPU, which enables us to efficiently train hybrid CPU-GPU systems for CNNs.

artificial intelligence, caffe, machine learning, (15 more...)

arXiv.org Machine Learning

1504.04343

Country: North America > United States (1.00)

Genre: Research Report (0.50)

Industry:

Government > Regional Government > North America Government > United States Government (0.93)
Information Technology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Weight Uncertainty in Neural Networks

Blundell, Charles, Cornebise, Julien, Kavukcuoglu, Koray, Wierstra, Daan

arXiv.org Machine LearningMay-21-2015

We introduce a new, efficient, principled and backpropagation-compatible algorithm for learning a probability distribution on the weights of a neural network, called Bayes by Backprop. It regularises the weights by minimising a compression cost, known as the variational free energy or the expected lower bound on the marginal likelihood. We show that this principled kind of regularisation yields comparable performance to dropout on MNIST classification. We then demonstrate how the learnt uncertainty in the weights can be used to improve generalisation in non-linear regression problems, and how this weight uncertainty can be used to drive the exploration-exploitation trade-off in reinforcement learning.

deep learning, neural network, upstream oil & gas, (15 more...)

arXiv.org Machine Learning

1505.05424

Country:

Europe > France (0.28)
Asia (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.86)

Add feedback

Qualitatively characterizing neural network optimization problems

Goodfellow, Ian J., Vinyals, Oriol, Saxe, Andrew M.

arXiv.org Machine LearningMay-21-2015

Training neural networks involves solving large-scale non-convex optimization problems. This task has long been believed to be extremely difficult, with fear of local minima and other obstacles motivating a variety of schemes to improve optimization, such as unsupervised pretraining. However, modern neural networks are able to achieve negligible training error on complex tasks, using only direct training with stochastic gradient descent. We introduce a simple analysis technique to look for evidence that such networks are overcoming local optima. We find that, in fact, on a straight path from initialization to solution, a variety of state of the art neural networks never encounter any significant obstacles.

artificial intelligence, experiment, machine learning, (15 more...)

arXiv.org Machine Learning

1412.6544

Country:

North America > Canada (0.28)
North America > United States > California > Santa Clara County (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Markov Chain Monte Carlo and Variational Inference: Bridging the Gap

Salimans, Tim, Kingma, Diederik P., Welling, Max

arXiv.org Machine LearningMay-19-2015

Recent advances in stochastic gradient variational inference have made it possible to perform variational Bayesian inference with posterior approximations containing auxiliary random variables. This enables us to explore a new synthesis of variational inference and Monte Carlo methods where we incorporate one or more steps of MCMC into our variational approximation. By doing so we obtain a rich class of inference algorithms bridging the gap between variational methods and MCMC, and offering the best of both worlds: fast posterior approximation through the maximization of an explicit objective, with the option of trading off additional computation for additional accuracy. We describe the theoretical foundations that make this possible and show some promising first results.

approximation, artificial intelligence, machine learning, (12 more...)

arXiv.org Machine Learning

1410.646

Country:

Europe (0.28)
Asia (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.54)
(2 more...)

Add feedback

HD-CNN: Hierarchical Deep Convolutional Neural Network for Large Scale Visual Recognition

Yan, Zhicheng, Zhang, Hao, Piramuthu, Robinson, Jagadeesh, Vignesh, DeCoste, Dennis, Di, Wei, Yu, Yizhou

arXiv.org Machine LearningMay-15-2015

In image classification, visual separability between different object categories is highly uneven, and some categories are more difficult to distinguish than others. Such difficult categories demand more dedicated classifiers. However, existing deep convolutional neural networks (CNN) are trained as flat N-way classifiers, and few efforts have been made to leverage the hierarchical structure of categories. In this paper, we introduce hierarchical deep CNNs (HD-CNNs) by embedding deep CNNs into a category hierarchy. An HD-CNN separates easy classes using a coarse category classifier while distinguishing difficult classes using fine category classifiers. During HD-CNN training, component-wise pretraining is followed by global finetuning with a multinomial logistic loss regularized by a coarse category consistency term. In addition, conditional executions of fine category classifiers and layer parameter compression make HD-CNNs scalable for large-scale visual recognition. We achieve state-of-the-art results on both CIFAR100 and large-scale ImageNet 1000-class benchmark datasets. In our experiments, we build up three different HD-CNNs and they lower the top-1 error of the standard CNNs by 2.65%, 3.1% and 1.1%, respectively.

artificial intelligence, category, machine learning, (17 more...)

arXiv.org Machine Learning

1410.0736

Country: North America (0.28)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Training generative neural networks via Maximum Mean Discrepancy optimization

Dziugaite, Gintare Karolina, Roy, Daniel M., Ghahramani, Zoubin

arXiv.org Machine LearningMay-14-2015

We consider training a deep neural network to generate samples from an unknown distribution given i.i.d. data. We frame learning as an optimization minimizing a two-sample test statistic---informally speaking, a good generator network produces samples that cause a two-sample test to fail to reject the null hypothesis. As our two-sample test statistic, we use an unbiased estimate of the maximum mean discrepancy, which is the centerpiece of the nonparametric kernel two-sample test proposed by Gretton et al. (2012). We compare to the adversarial nets framework introduced by Goodfellow et al. (2014), in which learning is a two-player game between a generator network and an adversarial discriminator network, both trained to outwit the other. From this perspective, the MMD statistic plays the role of the discriminator. In addition to empirical comparisons, we prove bounds on the generalization error incurred by optimizing the empirical MMD.

artificial intelligence, iteration, machine learning, (18 more...)

arXiv.org Machine Learning

1505.03906

Country: North America > Canada > Ontario > Toronto (0.14)

Genre:

Research Report (0.50)
Instructional Material (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Petuum: A New Platform for Distributed Machine Learning on Big Data

Xing, Eric P., Ho, Qirong, Dai, Wei, Kim, Jin Kyu, Wei, Jinliang, Lee, Seunghak, Zheng, Xun, Xie, Pengtao, Kumar, Abhimanu, Yu, Yaoliang

arXiv.org Machine LearningMay-14-2015

What is a systematic way to efficiently apply a wide spectrum of advanced ML programs to industrial scale problems, using Big Models (up to 100s of billions of parameters) on Big Data (up to terabytes or petabytes)? Modern parallelization strategies employ fine-grained operations and scheduling beyond the classic bulk-synchronous processing paradigm popularized by MapReduce, or even specialized graph-based execution that relies on graph representations of ML programs. The variety of approaches tends to pull systems and algorithms design in different directions, and it remains difficult to find a universal platform applicable to a wide range of ML programs at scale. We propose a general-purpose framework that systematically addresses data- and model-parallel challenges in large-scale ML, by observing that many ML programs are fundamentally optimization-centric and admit error-tolerant, iterative-convergent algorithmic solutions. This presents unique opportunities for an integrative system design, such as bounded-error network synchronization and dynamic scheduling based on ML program structure. We demonstrate the efficacy of these system designs versus well-known implementations of modern ML algorithms, allowing ML programs to run in much less time and at considerably larger model sizes, even on modestly-sized compute clusters.

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Machine Learning

1312.7651

Genre: Research Report (0.50)

Industry: Information Technology > Services (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(3 more...)

Add feedback

Modeling Compositionality with Multiplicative Recurrent Neural Networks

İrsoy, Ozan, Cardie, Claire

arXiv.org Machine LearningMay-2-2015

We present the multiplicative recurrent neural network as a general model for compositional meaning in language, and evaluate it on the task of fine-grained sentiment analysis. We establish a connection to the previously investigated matrix-space models for compositionality, and show they are special cases of the multiplicative recurrent net. Our experiments show that these models perform comparably or better than Elman-type additive recurrent neural networks and outperform matrix-space models on a standard fine-grained sentiment analysis corpus. Furthermore, they yield comparable results to structural deep models on the recently published Stanford Sentiment Treebank without the need for generating parse trees.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

1412.6577

Country:

Asia (0.46)
North America > United States (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Advanced Mean Field Theory of Restricted Boltzmann Machine

Huang, Haiping, Toyoizumi, Taro

arXiv.org Machine LearningMay-1-2015

Learning in restricted Boltzmann machine is typically hard due to the computation of gradients of log-likelihood function. To describe the network state statistics of the restricted Boltzmann machine, we develop an advanced mean field theory based on the Bethe approximation. Our theory provides an efficient message passing based method that evaluates not only the partition function (free energy) but also its gradients without requiring statistical sampling. The results are compared with those obtained by the computationally expensive sampling based method.

artificial intelligence, machine learning, node, (17 more...)

arXiv.org Machine Learning

doi: 10.1103/PhysRevE.91.050101

1502.00186

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.91)

Add feedback