AITopics | Deep Learning

Collaborating Authors

Deep Learning

New computational algorithms make it possible to build neural networks with many input nodes and many layers, and distinguish "deep learning" of these networks from previous work on artificial neural nets.

News Overviews Instructional Materials AI-Alerts Classics

Maxout Networks

Goodfellow, Ian J., Warde-Farley, David, Mirza, Mehdi, Courville, Aaron, Bengio, Yoshua

arXiv.org Machine LearningSep-20-2013

We consider the problem of designing models to leverage a recently introduced approximate model averaging technique called dropout. We define a simple new model called maxout (so named because its output is the max of a set of inputs, and because it is a natural companion to dropout) designed to both facilitate optimization by dropout and improve the accuracy of dropout's fast approximate model averaging technique. We empirically verify that the model successfully accomplishes both of these tasks. We use maxout and dropout to demonstrate state of the art classification performance on four benchmark datasets: MNIST, CIFAR-10, CIFAR-100, and SVHN.

artificial intelligence, dropout, machine learning, (19 more...)

arXiv.org Machine Learning

1302.4389

Country: North America > Canada (0.68)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Temporal Autoencoding Improves Generative Models of Time Series

Häusler, Chris, Susemihl, Alex, Nawrot, Martin P, Opper, Manfred

arXiv.org Machine LearningSep-12-2013

Restricted Boltzmann Machines (RBMs) are generative models which can learn useful representations from samples of a dataset in an unsupervised fashion. They have been widely employed as an unsupervised pre-training method in machine learning. RBMs have been modified to model time series in two main ways: The Temporal RBM stacks a number of RBMs laterally and introduces temporal dependencies between the hidden layer units; The Conditional RBM, on the other hand, considers past samples of the dataset as a conditional bias and learns a representation which takes these into account. Here we propose a new training method for both the TRBM and the CRBM, which enforces the dynamic structure of temporal datasets. We do so by treating the temporal models as denoising autoencoders, considering past frames of the dataset as corrupted versions of the present frame and minimizing the reconstruction error of the present data by the model. We call this approach Temporal Autoencoding. This leads to a significant improvement in the performance of both models in a filling-in-frames task across a number of datasets. The error reduction for motion capture data is 56\% for the CRBM and 80\% for the TRBM. Taking the posterior mean prediction instead of single samples further improves the model's estimates, decreasing the error by as much as 91\% for the CRBM on motion capture data. We also trained the model to perform forecasting on a large number of datasets and have found TA pretraining to consistently improve the performance of the forecasts. Furthermore, by looking at the prediction error across time, we can see that this improvement reflects a better representation of the dynamics of the data as opposed to a bias towards reconstructing the observed data on a short time scale.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Machine Learning

1309.3103

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.36)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.36)

Add feedback

Guided Self-Organization of Input-Driven Recurrent Neural Networks

Obst, Oliver, Boedecker, Joschka

arXiv.org Artificial IntelligenceSep-5-2013

We review attempts that have been made towards understanding the computational properties and mechanisms of input-driven dynamical systems like RNNs, and reservoir computing networks in particular. We provide details on methods that have been developed to give quantitative answers to the questions above. Following this, we show how self-organization may be used to improve reservoirs for better performance, in some cases guided by the measures presented before. We also present a possible way to quantify task performance using an information-theoretic approach, and finally discuss promising future directions aimed at a better understanding of how these systems perform their computations and how to best guide self-organized processes for their optimization.

dynamical system, information, reservoir, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-642-53734-9_11

1309.1524

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(3 more...)

Genre: Research Report (0.40)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.83)

Add feedback

Pylearn2: a machine learning research library

Goodfellow, Ian J., Warde-Farley, David, Lamblin, Pascal, Dumoulin, Vincent, Mirza, Mehdi, Pascanu, Razvan, Bergstra, James, Bastien, Frédéric, Bengio, Yoshua

arXiv.org Machine LearningAug-19-2013

Pylearn2 is a machine learning research library. This does not just mean that it is a collection of machine learning algorithms that share a common API; it means that it has been designed for flexibility and extensibility in order to facilitate research projects that involve new or unusual use cases. In this paper we give a brief history of the library, an overview of its basic philosophy, a summary of the library's architecture, and a description of how the Pylearn2 community functions socially.

artificial intelligence, machine learning, pylearn2, (17 more...)

arXiv.org Machine Learning

1308.4214

Country:

North America > United States (0.46)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

When are Overcomplete Topic Models Identifiable? Uniqueness of Tensor Tucker Decompositions with Structured Sparsity

Anandkumar, Animashree, Hsu, Daniel, Janzamin, Majid, Kakade, Sham

arXiv.org Machine LearningAug-13-2013

Overcomplete latent representations have been very popular for unsupervised feature learning in recent years. In this paper, we specify which overcomplete models can be identified given observable moments of a certain order. We consider probabilistic admixture or topic models in the overcomplete regime, where the number of latent topics can greatly exceed the size of the observed word vocabulary. While general overcomplete topic models are not identifiable, we establish generic identifiability under a constraint, referred to as topic persistence. Our sufficient conditions for identifiability involve a novel set of "higher order" expansion conditions on the topic-word matrix or the population structure of the model. This set of higher-order expansion conditions allow for overcomplete models, and require the existence of a perfect matching from latent topics to higher order observed words. We establish that random structured topic models are identifiable w.h.p. in the overcomplete regime. Our identifiability results allows for general (non-degenerate) distributions for modeling the topic proportions, and thus, we can handle arbitrarily correlated topics in our framework. Our identifiability results imply uniqueness of a class of tensor decompositions with structured sparsity which is contained in the class of Tucker decompositions, but is more general than the Candecomp/Parafac (CP) decomposition.

machine learning, natural language, topic model, (19 more...)

arXiv.org Machine Learning

1308.2853

Country: North America > United States (0.92)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Learning Features and their Transformations by Spatial and Temporal Spherical Clustering

Dutta, Jayanta K., Banerjee, Bonny

arXiv.org Artificial IntelligenceAug-10-2013

Learning features invariant to arbitrary transformations in the data is a requirement for any recognition system, biological or artificial. It is now widely accepted that simple cells in the primary visual cortex respond to features while the complex cells respond to features invariant to different transformations. We present a novel two-layered feedforward neural model that learns features in the first layer by spatial spherical clustering and invariance to transformations in the second layer by temporal spherical clustering. Learning occurs in an online and unsupervised manner following the Hebbian rule. When exposed to natural videos acquired by a camera mounted on a cat's head, the first and second layer neurons in our model develop simple and complex cell-like receptive field properties. The model can predict by learning lateral connections among the first layer neurons. A topographic map to their spatial features emerges by exponentially decaying the flow of activation with distance from one neuron to another in the first layer that fire in close temporal proximity, thereby minimizing the pooling length in an online manner simultaneously with feature learning.

activation, neuron, simple neuron, (16 more...)

arXiv.org Artificial Intelligence

1308.235

Country:

North America > United States > Tennessee > Shelby County > Memphis (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Deep Feature Learning Using Target Priors with Applications in ECoG Signal Decoding for BCI

Wang, Zuoguan (Rensselaer Polytechnic Institute) | Lyu, Siwei (University at Albany, SUNY) | Schalk, Gerwin (Wadsworth Center) | Ji, Qiang (Rensselaer Polytechnic Institute)

AAAI ConferencesAug-3-2013

Recent years have seen a great interest in using deep architectures for feature learning from data. One drawback of the commonly used unsupervised deep feature learning methods is that for supervised or semi-supervised learning tasks, the information in the target variables are not used until the final stage when the classifier or regressor is trained on the learned features. This could lead to over-generalized features that are not competitive on the specific supervised or semi-supervised learning tasks. In this work, we describe a new learning method that combines deep feature learning on mixed labeled and unlabeled data sets. Specifically, we describe a weakly supervised learning method of a prior supervised convolutional stacked auto-encoders (PCSA), of which information in the target variables is represented probabilistically using a Gaussian Bernoulli restricted Boltzmann machine (RBM). We apply this method to the decoding problem of an ECoG based Brain Computer Interface (BCI) system. Our experimental results show that PCSA achieves significant improvement in decoding performance on benchmark data sets compared to the unsupervised feature learning as well as to the current state-of-the-art algorithms that are based on manually crafted features.

application, deep feature learning, ecog signal decoding, (1 more...)

AAAI Conferences

Twenty-Third International Joint Conference on Artificial Intelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

Conditional Restricted Boltzmann Machines for Negotiations in Highly Competitive and Complex Domains

Chen, Siqi (Maastricht University) | Ammar, Haitham Bou (Maastricht University) | Tuyls, Karl (Maastricht University) | Weiss, Gerhard (Maastricht University)

AAAI ConferencesAug-3-2013

Learning in automated negotiations, while useful, is hard because of the indirect way the target function can be observed and the limited amount of experience available to learn from. This paper proposes two novel opponent modeling techniques based on deep learning methods. Moreover, to improve the learning efficacy of negotiating agents, the second approach is also capable of transferring knowledge efficiently between negotiation tasks. Transfer is conducted by automatically mapping the source knowledge to the target in a rich feature space. Experiments show that using these techniques the proposed strategies outperform existing state-of-the-art agents in highly competitive and complex negotiation domains. Furthermore, the empirical game theoretic analysis reveals the robustness of the proposed strategies.

competitive and complex domain, conditional restricted boltzmann machine, negotiation

AAAI Conferences

Twenty-Third International Joint Conference on Artificial Intelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.40)

Add feedback

Knowledge Matters: Importance of Prior Information for Optimization

Gülçehre, Çağlar, Bengio, Yoshua

arXiv.org Machine LearningJul-13-2013

We explore the effect of introducing prior information into the intermediate level of neural networks for a learning task on which all the state-of-the-art machine learning algorithms tested failed to learn. We motivate our work from the hypothesis that humans learn such intermediate concepts from other individuals via a form of supervision or guidance using a curriculum. The experiments we have conducted provide positive evidence in favor of this hypothesis. In our experiments, a two-tiered MLP architecture is trained on a dataset with 64x64 binary inputs images, each image with three sprites. The final task is to decide whether all the sprites are the same or one of them is different. Sprites are pentomino tetris shapes and they are placed in an image with different locations using scaling and rotation transformations. The first part of the two-tiered MLP is pre-trained with intermediate-level targets being the presence of sprites at each location, while the second part takes the output of the first part as input and predicts the final task's target binary event. The two-tiered MLP architecture, with a few tens of thousand examples, was able to learn the task perfectly, whereas all other algorithms (include unsupervised pre-training, but also traditional algorithms like SVMs, decision trees and boosting) all perform no better than chance. We hypothesize that the optimization difficulty involved when the intermediate pre-training is not performed is due to the {\em composition} of two highly non-linear tasks. Our findings are also consistent with hypotheses on cultural learning inspired by the observations of optimization problems with deep learning, presumably because of effective local minima.

artificial intelligence, experiment, machine learning, (20 more...)

arXiv.org Machine Learning

1301.4083

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (0.46)
Education > Educational Setting > Online (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Building on Deep Learning

Pickett, Marc (Naval Research Laboratory)

AAAI ConferencesJul-9-2013

We propose using deep learning as the "workhorse" of a cognitive architecture. We show how deep learning can be leveraged to learn representations, such as a hierarchy of analogical schemas, from relational data. This approach to higher cognition drives some desiderata of deep learning, particularly modality independence and the ability to make top-down predictions. Finally, we consider the problem of how relational representations might be learned from sensor data that is not explicitly relational.

deep learning

AAAI Conferences

Workshops at the Twenty-Seventh AAAI Conference on Artificial Intelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback