AITopics | Deep Learning

Collaborating Authors

Deep Learning

New computational algorithms make it possible to build neural networks with many input nodes and many layers, and distinguish "deep learning" of these networks from previous work on artificial neural nets.

News Overviews Instructional Materials AI-Alerts Classics

Deep Big Simple Neural Nets Excel on Handwritten Digit Recognition

Ciresan, Dan Claudiu, Meier, Ueli, Gambardella, Luca Maria, Schmidhuber, Juergen

arXiv.org Artificial IntelligenceMar-1-2010

Good old on-line back-propagation for plain multi-layer perceptrons yields a very low 0.35% error rate on the famous MNIST handwritten digits benchmark. All we need to achieve this best result so far are many hidden layers, many neurons per layer, numerous deformed training images, and graphics cards to greatly speed up learning.

deep learning, neural network, threadidx, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1162/NECO_a_00052

1003.0358

Country: Europe (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Energy > Oil & Gas (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Classifying Network Data with Deep Kernel Machines

Tang, Xiao, Zhu, Mu

arXiv.org Machine LearningJan-22-2010

Inspired by a growing interest in analyzing network data, we study the problem of node classification on graphs, focusing on approaches based on kernel machines. Conventionally, kernel machines are linear classifiers in the implicit feature space. We argue that linear classification in the feature space of kernels commonly used for graphs is often not enough to produce good results. When this is the case, one naturally considers nonlinear classifiers in the feature space. We show that repeating this process produces something we call "deep kernel machines." We provide some examples where deep kernel machines can make a big difference in classification performance, and point out some connections to various recent literature on deep architectures in artificial intelligence and machine learning.

artificial intelligence, kernel machine, machine learning, (17 more...)

arXiv.org Machine Learning

1001.4019

Country:

North America > Canada (0.28)
North America > United States (0.28)
Europe > United Kingdom > England (0.14)

Genre: Research Report (0.64)

Industry:

Energy (0.69)
Law (0.68)
Telecommunications > Networks (0.61)
Information Technology > Networks (0.61)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

A Monte Carlo Algorithm for Universally Optimal Bayesian Sequence Prediction and Planning

Di Franco, Anthony

arXiv.org Artificial IntelligenceJan-17-2010

The aim of this work is to address the question of whether we can in principle design rational decision-making agents or artificial intelligences embedded in computable physics such that their decisions are optimal in reasonable mathematical senses. Recent developments in rare event probability estimation, recursive bayesian inference, neural networks, and probabilistic planning are sufficient to explicitly approximate reinforcement learners of the AIXI style with non-trivial model classes (here, the class of resource-bounded Turing machines). Consideration of the effects of resource limitations in a concrete implementation leads to insights about possible architectures for learning systems using optimal decision makers as components.

artificial intelligence, machine learning, sequence, (12 more...)

arXiv.org Artificial Intelligence

1001.2813

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
Europe > Finland > Ostrobothnia > Vaasa (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)

Add feedback

An Empirical Evaluation of Four Algorithms for Multi-Class Classification: Mart, ABC-Mart, Robust LogitBoost, and ABC-LogitBoost

Li, Ping

arXiv.org Artificial IntelligenceJan-7-2010

This empirical study is mainly devoted to comparing four tree-based boosting algorithms: mart, abc-mart, robust logitboost, and abc-logitboost, for multi-class classification on a variety of publicly available datasets. Some of those datasets have been thoroughly tested in prior studies using a broad range of classification algorithms including SVM, neural nets, and deep learning. In terms of the empirical classification errors, our experiment results demonstrate: 1. Abc-mart considerably improves mart. 2. Abc-logitboost considerably improves (robust) logitboost. 3. Robust) logitboost} considerably improves mart on most datasets. 4. Abc-logitboost considerably improves abc-mart on most datasets. 5. These four boosting algorithms (especially abc-logitboost) outperform SVM on many datasets. 6. Compared to the best deep learning methods, these four boosting algorithms (especially abc-logitboost) are competitive.

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

1001.102

Country: North America > United States > New York > Tompkins County > Ithaca (0.04)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

Add feedback

Implicit Mixtures of Restricted Boltzmann Machines

Nair, Vinod, Hinton, Geoffrey E.

Neural Information Processing SystemsDec-31-2009

We present a mixture model whose components are Restricted Boltzmann Machines (RBMs). This possibility has not been considered before because computing the partition function of an RBM is intractable, which appears to make learning a mixture of RBMs intractable as well. Surprisingly, when formulated as a third-order Boltzmann machine, such a mixture model can be learned tractably using contrastive divergence. The energy function of the model captures three-way interactions among visible units, hidden units, and a single hidden multinomial unit that represents the cluster labels. The distinguishing feature of this model is that, unlike other mixture models, the mixing proportions are not explicitly parameterized. Instead, they are defined implicitly via the energy function and depend on all the parameters in the model. We present results for the MNIST and NORB datasets showing that the implicit mixture of RBMs learns clusters that reflect the class structure in the data.

component rbm, mixture model, rbm, (14 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.15)
North America > United States > District of Columbia > Washington (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks

Graves, Alex, Schmidhuber, Jürgen

Neural Information Processing SystemsDec-31-2009

Offline handwriting recognition---the transcription of images of handwritten text---is an interesting task, in that it combines computer vision with sequence learning. In most systems the two elements are handled separately, with sophisticated preprocessing techniques used to extract the image features and sequential models such as HMMs used to provide the transcriptions. By combining two recent innovations in neural networks---multidimensional recurrent neural networks and connectionist temporal classification---this paper introduces a globally trained offline handwriting recogniser that takes raw pixel data as input. Unlike competing systems, it does not require any alphabet specific preprocessing, and can therefore be used unchanged for any language. Evidence of its generality and power is provided by data from a recent international Arabic recognition competition, where it outperformed all entries (91.4% accuracy compared to 87.2% for the competition winner) despite the fact that neither author understands a word of Arabic.

artificial intelligence, deep learning, machine learning, (15 more...)

Neural Information Processing Systems

Country:

Europe (0.68)
North America > United States (0.28)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

Deep Learning with Kernel Regularization for Visual Recognition

Yu, Kai, Xu, Wei, Gong, Yihong

Neural Information Processing SystemsDec-31-2009

In this paper we focus on training deep neural networks for visual recognition tasks. One challenge is the lack of an informative regularization on the network parameters, to imply a meaningful control on the computed function. We propose a training strategy that takes advantage of kernel methods, where an existing kernel function represents useful prior knowledge about the learning task of interest. We derive an efficient algorithm using stochastic gradient descent, and demonstrate very positive results in a wide range of visual recognition tasks.

artificial intelligence, machine learning, recognition, (13 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Recurrent Temporal Restricted Boltzmann Machine

Sutskever, Ilya, Hinton, Geoffrey E., Taylor, Graham W.

Neural Information Processing SystemsDec-31-2009

The Temporal Restricted Boltzmann Machine (TRBM) is a probabilistic model for sequences that is able to successfully model (i.e., generate nice-looking samples of) several very high dimensional sequences, such as motion capture data and the pixels of low resolution videos of balls bouncing in a box. The major disadvantage of the TRBM is that exact inference is extremely hard, since even computing a Gibbs update for a single variable of the posterior is exponentially expensive. This difficulty has necessitated the use of a heuristic inference procedure, that nonetheless was accurate enough for successful learning. In this paper we introduce the Recurrent TRBM, which is a very slight modification of the TRBM for which exact inference is very easy and exact gradient learning is almost tractable. We demonstrate that the RTRBM is better than an analogous TRBM at generating motion capture and videos of bouncing balls.

artificial intelligence, machine learning, trbm, (19 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Evaluating probabilities under high-dimensional latent variable models

Murray, Iain, Salakhutdinov, Ruslan R.

Neural Information Processing SystemsDec-31-2009

We present a simple new Monte Carlo algorithm for evaluating probabilities of observations in complex latent variable models, such as Deep Belief Networks. While the method is based on Markov chains, estimates based on short runs are formally unbiased. In expectation, the log probability of a test set will be underestimated, and this could form the basis of a probabilistic bound. The method is much cheaper than gold-standard annealing-based methods and only slightly more expensive than the cheapest Monte Carlo methods. We give examples of the new method substantially improving simple variational bounds at modest extra cost.

artificial intelligence, deep learning, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario > Toronto (0.15)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.52)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

3D Object Recognition with Deep Belief Nets

Nair, Vinod, Hinton, Geoffrey E.

Neural Information Processing SystemsDec-31-2009

We introduce a new type of Deep Belief Net and evaluate it on a 3D object recognition task. The top-level model is a third-order Boltzmann machine, trained using a hybrid algorithm that combines both generative and discriminative gradients. Performance is evaluated on the NORB database(normalized-uniform version), which contains stereo-pair images of objects under different lighting conditions and viewpoints. Our model achieves 6.5% error on the test set, which is close to the best published result for NORB (5.9%) using a convolutional neural net that has built-in knowledge of translation invariance. It substantially outperforms shallow models such as SVMs (11.6%). DBNs are especially suited for semi-supervised learning, and to demonstrate this we consider a modified version of the NORB recognition task in which additional unlabeled images are created by applying small translations to the images in the database. With the extra unlabeled data (and the same amount of labeled data as before), our model achieves 5.2% error, making it the current best result for NORB.

artificial intelligence, level model, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.36)

Add feedback