AITopics | Deep Learning

Collaborating Authors

Deep Learning

New computational algorithms make it possible to build neural networks with many input nodes and many layers, and distinguish "deep learning" of these networks from previous work on artificial neural nets.

News Overviews Instructional Materials AI-Alerts Classics

Sports Video Classification from Multimodal Information Using Deep Neural Networks

Sachan, Devendra Singh (Indian Institute of Technology, Guwahati) | Tekwani, Umesh (Indian Institute of Technology, Guwahati) | Sethi, Amit (Indian Institute of Technology, Guwahati)

AAAI ConferencesNov-14-2013

The work presents a methodology for classification of sports videos using both audio and visual information by applying deep learning algorithms. We show a methodology to combine multiple deep learning architectures through higher layers. Our method learns two separate models trained on audio and visual part of the data. We have trained the model for the audio part of the multimedia input using two stacked layers of CRBMs forminga CDBN. We also train two layered ISA network to extract features from video part of the data. We then train deep stacked autoencoder over both audio and visual features with discriminative fine tuning. Our results show that by combining both audio and visual features we get better accuracy as compared to single type of features.

artificial intelligence, machine learning, sport video classification, (3 more...)

AAAI Conferences

2013 AAAI Fall Symposium Series

Genre: Research Report > New Finding (0.53)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Preface

Risi, Sebastian (IT University of Copenhagen) | Lehman, Joel (University of Texas at Austin) | Clune, Jeff (University of Wyoming)

AAAI ConferencesNov-14-2013

Subfields of artificial intelligence often diversify from a core idea. For example, deep learning networks, models in computational neuroscience, and neuroevolution all take inspiration from biological neural networks as a potential pathway to AI. Most researchers choose to pursue the subfield (and by extension, abstraction) they see as most promising for leading to AI, which naturally results in significant debate and disagreement among researchers as to what abstraction is best. A better understanding and less polarized debate may result from a clear presentation and discussion of abstractions by their most knowledgeable proponents. These insights motivated bringing together researchers from fields that abstract AI at different levels or in different ways to disperse knowledge, and to critically examining the value and promise of different abstractions. Thus this AAAI symposium, How Intelligence Should be Abstracted in AI, consisted of a diverse and multidisciplinary group of AI researchers interested in discussing and comparing different abstractions of both intelligence and processes that might create it.

deep learning, machine learning, preface, (1 more...)

AAAI Conferences

2013 AAAI Fall Symposium Series

Industry: Health & Medicine > Therapeutic Area > Neurology (0.53)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)
Information Technology > Artificial Intelligence > Cognitive Science (0.53)

Add feedback

Mean Field Bayes Backpropagation: scalable training of multilayer neural networks with binary weights

Soudry, Daniel, Meir, Ron

arXiv.org Machine LearningOct-24-2013

Significant success has been reported recently using deep neural networks for classification. Such large networks can be computationally intensive, even after training is over. Implementing these trained networks in hardware chips with a limited precision of synaptic weights may improve their speed and energy efficiency by several orders of magnitude, thus enabling their integration into small and low-power electronic devices. With this motivation, we develop a computationally efficient learning algorithm for multilayer neural networks with binary weights, assuming all the hidden neurons have a fan-out of one. This algorithm, derived within a Bayesian probabilistic online setting, is shown to work well for both synthetic and real-world problems, performing comparably to algorithms with real-valued weights, while retaining computational tractability.

artificial intelligence, bayesian inference, machine learning, (20 more...)

arXiv.org Machine Learning

1310.1867

Country: Asia (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Provable Bounds for Learning Some Deep Representations

Arora, Sanjeev, Bhaskara, Aditya, Ge, Rong, Ma, Tengyu

arXiv.org Artificial IntelligenceOct-23-2013

We give algorithms with provable guarantees that learn a class of deep nets in the generative model view popularized by Hinton and others. Our generative model is an $n$ node multilayer neural net that has degree at most $n^{\gamma}$ for some $\gamma <1$ and each edge has a random edge weight in $[-1,1]$. Our algorithm learns {\em almost all} networks in this class with polynomial running time. The sample complexity is quadratic or cubic depending upon the details of the model. The algorithm uses layerwise learning. It is based upon a novel idea of observing correlations among features and using these to infer the underlying edge structure via a global graph recovery procedure. The analysis of the algorithm reveals interesting structure of neural networks with random edge weights.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

1310.6343

Country:

North America > United States > New Jersey > Middlesex County > New Brunswick (0.04)
Europe > Switzerland (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)

Add feedback

Optimally fuzzy temporal memory

Shankar, Karthik H., Howard, Marc W.

arXiv.org Artificial IntelligenceOct-22-2013

Any learner with the ability to predict the future of a structured time-varying signal must maintain a memory of the recent past. If the signal has a characteristic timescale relevant to future prediction, the memory can be a simple shift register---a moving window extending into the past, requiring storage resources that linearly grows with the timescale to be represented. However, an independent general purpose learner cannot a priori know the characteristic prediction-relevant timescale of the signal. Moreover, many naturally occurring signals show scale-free long range correlations implying that the natural prediction-relevant timescale is essentially unbounded. Hence the learner should maintain information from the longest possible timescale allowed by resource availability. Here we construct a fuzzy memory system that optimally sacrifices the temporal accuracy of information in a scale-free fashion in order to represent prediction-relevant information from exponentially long timescales. Using several illustrative examples, we demonstrate the advantage of the fuzzy memory system over a shift register in time series forecasting of natural signals. When the available storage resources are limited, we suggest that a general purpose learner would be better off committing to such a fuzzy memory system.

artificial intelligence, machine learning, node, (17 more...)

arXiv.org Artificial Intelligence

1211.5189

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > New York (0.04)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Hardware (0.99)
Information Technology > Data Science (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Distributed Representations of Words and Phrases and their Compositionality

Mikolov, Tomas, Sutskever, Ilya, Chen, Kai, Corrado, Greg, Dean, Jeffrey

arXiv.org Machine LearningOct-16-2013

The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Machine Learning

1310.4546

Country:

North America > Canada (1.00)
Europe (1.00)
Asia (1.00)
North America > United States > New York (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Transportation (0.70)
Leisure & Entertainment > Sports > Hockey (0.69)
Leisure & Entertainment > Sports > Basketball (0.68)
Leisure & Entertainment > Games > Chess (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Deep Multiple Kernel Learning

Strobl, Eric, Visweswaran, Shyam

arXiv.org Machine LearningOct-11-2013

Deep learning methods construct new features by transforming the input data through multiple layers of nonlinear processing. This has conventionally been accomplished by training a large artificial neural network with several hidden layers. However, the method has been limited to datasets with very large sample sizes such as the MNIST dataset which contains 60,000 training samples. More recently, there has been a drive to apply deep learning to datasets with more limited sample sizes as typical in many real-world situations. Kernel methods have been particularly successful on a variety of sample sizes because they can enable a classifier to learn a complex decision boundary with only a few parameters by projecting the data onto a high-dimensional reproducing kernel Hilbert space.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Machine Learning

doi: 10.1109/ICMLA.2013.84

1310.3101

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Understanding Boltzmann Machine and Deep Learning via A Confident Information First Principle

Zhao, Xiaozhao, Hou, Yuexian, Yu, Qian, Song, Dawei, Li, Wenjie

arXiv.org Machine LearningOct-9-2013

Typical dimensionality reduction methods focus on directly reducing the number of random variables while retaining maximal variations in the data. In this paper, we consider the dimensionality reduction in parameter spaces of binary multivariate distributions. We propose a general Confident-Information-First (CIF) principle to maximally preserve parameters with confident estimates and rule out unreliable or noisy parameters. Formally, the confidence of a parameter can be assessed by its Fisher information, which establishes a connection with the inverse variance of any unbiased estimate for the parameter via the Cram\'{e}r-Rao bound. We then revisit Boltzmann machines (BM) and theoretically show that both single-layer BM without hidden units (SBM) and restricted BM (RBM) can be solidly derived using the CIF principle. This can not only help us uncover and formalize the essential parts of the target density that SBM and RBM capture, but also suggest that the deep neural network consisting of several layers of RBM can be seen as the layer-wise application of CIF. Guided by the theoretical analysis, we develop a sample-specific CIF-based contrastive divergence (CD-CIF) algorithm for SBM and a CIF-based iterative projection procedure (IP) for RBM. Both CD-CIF and IP are studied in a series of density estimation experiments.

artificial intelligence, information, machine learning, (18 more...)

arXiv.org Machine Learning

1302.3931

Country:

North America (0.67)
Asia > China (0.46)
Europe > United Kingdom > England (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.85)

Add feedback

Discriminative Features via Generalized Eigenvectors

Karampatziakis, Nikos, Mineiro, Paul

arXiv.org Machine LearningOct-7-2013

Representing examples in a way that is compatible with the underlying classifier can greatly enhance the performance of a learning system. In this paper we investigate scalable techniques for inducing discriminative features by taking advantage of simple second order structure in the data. We focus on multiclass classification and show that features extracted from the generalized eigenvectors of the class conditional second moments lead to classifiers with excellent empirical performance. Moreover, these features have attractive theoretical properties, such as inducing representations that are invariant to linear transformations of the input. We evaluate classifiers built from these features on three different tasks, obtaining state of the art results.

eigenvector, machine learning, natural language, (17 more...)

arXiv.org Machine Learning

1310.1934

Country: North America > United States (0.46)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Modeling Documents with Deep Boltzmann Machines

Srivastava, Nitish, Salakhutdinov, Ruslan R, Hinton, Geoffrey E.

arXiv.org Machine LearningSep-26-2013

We introduce a Deep Boltzmann Machine model suitable for modeling and extracting latent semantic representations from a large unstructured collection of documents. We overcome the apparent difficulty of training a DBM with judicious parameter tying. This parameter tying enables an efficient pretraining algorithm and a state initialization scheme that aids inference. The model can be trained just as efficiently as a standard Restricted Boltzmann Machine. Our experiments show that the model assigns better log probability to unseen data than the Replicated Softmax model. Features extracted from our model outperform LDA, Replicated Softmax, and DocNADE models on document retrieval and document classification tasks.

artificial intelligence, machine learning, replicated softmax model, (18 more...)

arXiv.org Machine Learning

1309.6865

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.91)

Add feedback