AITopics | Deep Learning

Collaborating Authors

Deep Learning

New computational algorithms make it possible to build neural networks with many input nodes and many layers, and distinguish "deep learning" of these networks from previous work on artificial neural nets.

News Overviews Instructional Materials AI-Alerts Classics

Advanced Mean Field Theory of Restricted Boltzmann Machine

Huang, Haiping, Toyoizumi, Taro

arXiv.org Machine LearningMay-1-2015

Learning in restricted Boltzmann machine is typically hard due to the computation of gradients of log-likelihood function. To describe the network state statistics of the restricted Boltzmann machine, we develop an advanced mean field theory based on the Bethe approximation. Our theory provides an efficient message passing based method that evaluates not only the partition function (free energy) but also its gradients without requiring statistical sampling. The results are compared with those obtained by the computationally expensive sampling based method.

artificial intelligence, machine learning, node, (17 more...)

arXiv.org Machine Learning

doi: 10.1103/PhysRevE.91.050101

1502.00186

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.91)

Add feedback

A Group Theoretic Perspective on Unsupervised Deep Learning

Paul, Arnab, Venkatasubramanian, Suresh

arXiv.org Machine LearningApr-21-2015

Why does Deep Learning work? What representations does it capture? How do higher-order representations emerge? We study these questions from the perspective of group theory, thereby opening a new approach towards a theory of Deep learning. One factor behind the recent resurgence of the subject is a key algorithmic step called {\em pretraining}: first search for a good generative model for the input samples, and repeat the process one layer at a time. We show deeper implications of this simple principle, by establishing a connection with the interplay of orbits and stabilizers of group actions. Although the neural networks themselves may not form groups, we show the existence of {\em shadow} groups whose elements serve as close approximations. Over the shadow groups, the pre-training step, originally introduced as a mechanism to better initialize a network, becomes equivalent to a search for features with minimal orbits. Intuitively, these features are in a way the {\em simplest}. Which explains why a deep learning network learns simple features first. Next, we show how the same principle, when repeated in the deeper layers, can capture higher order representations, and why representation complexity increases as the layers get deeper.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Machine Learning

1504.02462

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learning Activation Functions to Improve Deep Neural Networks

Agostinelli, Forest, Hoffman, Matthew, Sadowski, Peter, Baldi, Pierre

arXiv.org Machine LearningApr-21-2015

Artificial neural networks typically have a fixed, non-linear activation function at each neuron. We have designed a novel form of piecewise linear activation function that is learned independently for each neuron using gradient descent. With this adaptive activation function, we are able to improve upon deep neural network architectures composed of static rectified linear units, achieving state-of-the-art performance on CIFAR-10 (7.51%), CIFAR-100 (30.83%), and a benchmark from high-energy physics involving Higgs boson decay modes.

artificial intelligence, deep learning, machine learning, (14 more...)

arXiv.org Machine Learning

1412.683

Country: North America > United States > California > Orange County > Irvine (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Deep Convolutional Neural Networks Based on Semi-Discrete Frames

Wiatowski, Thomas, Bölcskei, Helmut

arXiv.org Machine LearningApr-21-2015

Deep convolutional neural networks have led to breakthrough results in practical feature extraction applications. The mathematical analysis of these networks was pioneered by Mallat, 2012. Specifically, Mallat considered so-called scattering networks based on identical semi-discrete wavelet frames in each network layer, and proved translation-invariance as well as deformation stability of the resulting feature extractor. The purpose of this paper is to develop Mallat's theory further by allowing for different and, most importantly, general semi-discrete frames (such as, e.g., Gabor frames, wavelets, curvelets, shearlets, ridgelets) in distinct network layers. This allows to extract wider classes of features than point singularities resolved by the wavelet transform. Our generalized feature extractor is proven to be translation-invariant, and we develop deformation stability results for a larger class of deformations than those considered by Mallat. For Mallat's wavelet-based feature extractor, we get rid of a number of technical conditions. The mathematical engine behind our results is continuous frame theory, which allows us to completely detach the invariance and deformation stability proofs from the particular algebraic structure of the underlying frames.

artificial intelligence, feature extractor, machine learning, (17 more...)

arXiv.org Machine Learning

1504.05487

Genre: Research Report > New Finding (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.61)

Add feedback

Gap Analysis of Natural Language Processing Systems with respect to Linguistic Modality

Shukla, Vishal

arXiv.org Artificial IntelligenceApr-18-2015

Modality is one of the important components of grammar in linguistics. It lets speaker to express attitude towards, or give assessment or potentiality of state of affairs. It implies different senses and thus has different perceptions as per the context. This paper presents an account showing the gap in the functionality of the current state of art Natural Language Processing (NLP) systems. The contextual nature of linguistic modality is studied. In this paper, the works and logical approaches employed by Natural Language Processing systems dealing with modality are reviewed. It sees human cognition and intelligence as multi-layered approach that can be implemented by intelligent systems for learning. Lastly, current flow of research going on within this field is talked providing futurology.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

1504.04716

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Compositional Distributional Semantics with Long Short Term Memory

Le, Phong, Zuidema, Willem

arXiv.org Artificial IntelligenceApr-17-2015

We are proposing an extension of the recursive neural network that makes use of a variant of the long short-term memory architecture. The extension allows information low in parse trees to be stored in a memory register (the `memory cell') and used much later higher up in the parse tree. This provides a solution to the vanishing gradient problem and allows the network to capture long range dependencies. Experimental results show that our composition outperformed the traditional neural-network composition on the Stanford Sentiment Treebank.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

1503.0251

Country:

North America > United States (0.46)
Europe (0.46)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Random Forests Can Hash

Qiu, Qiang, Sapiro, Guillermo, Bronstein, Alex

arXiv.org Machine LearningApr-16-2015

Hash codes are a very efficient data representation needed to be able to cope with the ever growing amounts of data. We introduce a random forest semantic hashing scheme with information-theoretic code aggregation, showing for the first time how random forest, a technique that together with deep learning have shown spectacular results in classification, can also be extended to large-scale retrieval. Traditional random forest fails to enforce the consistency of hashes generated from each tree for the same class data, i.e., to preserve the underlying similarity, and it also lacks a principled way for code aggregation across trees. We start with a simple hashing scheme, where independently trained random trees in a forest are acting as hashing functions. We the propose a subspace model as the splitting function, and show that it enforces the hash consistency in a tree for data from the same class. We also introduce an information-theoretic approach for aggregating codes of individual trees into a single hash code, producing a near-optimal unique hash for each class. Experiments on large-scale public datasets are presented, showing that the proposed approach significantly outperforms state-of-the-art hashing methods for retrieval tasks.

artificial intelligence, machine learning, random forest, (16 more...)

arXiv.org Machine Learning

1412.5083

Country: North America > Canada (0.15)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

High-performance Kernel Machines with Implicit Distributed Optimization and Randomization

Sindhwani, Vikas, Avron, Haim

arXiv.org Machine LearningApr-16-2015

In order to fully utilize "big data", it is often required to use "big models". Such models tend to grow with the complexity and size of the training data, and do not make strong parametric assumptions upfront on the nature of the underlying statistical dependencies. Kernel methods fit this need well, as they constitute a versatile and principled statistical methodology for solving a wide range of non-parametric modelling problems. However, their high computational costs (in storage and time) pose a significant barrier to their widespread adoption in big data applications. We propose an algorithmic framework and high-performance implementation for massive-scale training of kernel-based statistical models, based on combining two key technical ingredients: (i) distributed general purpose convex optimization, and (ii) the use of randomization to improve the scalability of kernel methods. Our approach is based on a block-splitting variant of the Alternating Directions Method of Multipliers, carefully reconfigured to handle very large random feature matrices, while exploiting hybrid parallelism typically found in modern clusters of multicore machines. Our implementation supports a variety of statistical learning tasks by enabling several loss functions, regularization schemes, kernels, and layers of randomized approximations for both dense and sparse datasets, in a highly extensible framework. We evaluate the ability of our framework to learn models on data from applications, and provide a comparison against existing sequential and parallel libraries.

algorithm, artificial intelligence, machine learning, (20 more...)

arXiv.org Machine Learning

1409.094

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning

Neyshabur, Behnam, Tomioka, Ryota, Srebro, Nathan

arXiv.org Artificial IntelligenceApr-16-2015

We present experiments demonstrating that some other form of capacity control, different from network size, plays a central role in learning multi-layer feedforward networks. We argue, partially through analogy to matrix factorization, that this is an inductive bias that can help shed light on deep learning.

artificial intelligence, machine learning, regularization, (18 more...)

arXiv.org Artificial Intelligence

1412.6614

Country: North America > United States (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

A Generative Model for Deep Convolutional Learning

Pu, Yunchen, Yuan, Xin, Carin, Lawrence

arXiv.org Machine LearningApr-15-2015

A generative model is developed for deep (multi-layered) convolutional dictionary learning. A novel probabilistic pooling operation is integrated into the deep model, yielding efficient bottom-up (pretraining) and top-down (refinement) probabilistic learning. Experimental results demonstrate powerful capabilities of the model to learn multi-layer features from images, and excellent classification results are obtained on the MNIST and Caltech 101 datasets.

machine learning, natural language, pixel, (15 more...)

arXiv.org Machine Learning

1504.04054

Country: North America > United States (0.15)

Genre: Research Report > New Finding (0.36)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback