Deep Learning
Sports Video Classification from Multimodal Information Using Deep Neural Networks
Sachan, Devendra Singh (Indian Institute of Technology, Guwahati) | Tekwani, Umesh (Indian Institute of Technology, Guwahati) | Sethi, Amit (Indian Institute of Technology, Guwahati)
The work presents a methodology for classification of sports videos using both audio and visual information by applying deep learning algorithms. We show a methodology to combine multiple deep learning architectures through higher layers. Our method learns two separate models trained on audio and visual part of the data. We have trained the model for the audio part of the multimedia input using two stacked layers of CRBMs forminga CDBN. We also train two layered ISA network to extract features from video part of the data. We then train deep stacked autoencoder over both audio and visual features with discriminative fine tuning. Our results show that by combining both audio and visual features we get better accuracy as compared to single type of features.
Preface
Risi, Sebastian (IT University of Copenhagen) | Lehman, Joel (University of Texas at Austin) | Clune, Jeff (University of Wyoming)
Subfields of artificial intelligence often diversify from a core idea. For example, deep learning networks, models in computational neuroscience, and neuroevolution all take inspiration from biological neural networks as a potential pathway to AI. Most researchers choose to pursue the subfield (and by extension, abstraction) they see as most promising for leading to AI, which naturally results in significant debate and disagreement among researchers as to what abstraction is best. A better understanding and less polarized debate may result from a clear presentation and discussion of abstractions by their most knowledgeable proponents. These insights motivated bringing together researchers from fields that abstract AI at different levels or in different ways to disperse knowledge, and to critically examining the value and promise of different abstractions. Thus this AAAI symposium, How Intelligence Should be Abstracted in AI, consisted of a diverse and multidisciplinary group of AI researchers interested in discussing and comparing different abstractions of both intelligence and processes that might create it.
Mean Field Bayes Backpropagation: scalable training of multilayer neural networks with binary weights
Significant success has been reported recently using deep neural networks for classification. Such large networks can be computationally intensive, even after training is over. Implementing these trained networks in hardware chips with a limited precision of synaptic weights may improve their speed and energy efficiency by several orders of magnitude, thus enabling their integration into small and low-power electronic devices. With this motivation, we develop a computationally efficient learning algorithm for multilayer neural networks with binary weights, assuming all the hidden neurons have a fan-out of one. This algorithm, derived within a Bayesian probabilistic online setting, is shown to work well for both synthetic and real-world problems, performing comparably to algorithms with real-valued weights, while retaining computational tractability.
Provable Bounds for Learning Some Deep Representations
Arora, Sanjeev, Bhaskara, Aditya, Ge, Rong, Ma, Tengyu
We give algorithms with provable guarantees that learn a class of deep nets in the generative model view popularized by Hinton and others. Our generative model is an $n$ node multilayer neural net that has degree at most $n^{\gamma}$ for some $\gamma <1$ and each edge has a random edge weight in $[-1,1]$. Our algorithm learns {\em almost all} networks in this class with polynomial running time. The sample complexity is quadratic or cubic depending upon the details of the model. The algorithm uses layerwise learning. It is based upon a novel idea of observing correlations among features and using these to infer the underlying edge structure via a global graph recovery procedure. The analysis of the algorithm reveals interesting structure of neural networks with random edge weights.
Optimally fuzzy temporal memory
Shankar, Karthik H., Howard, Marc W.
Any learner with the ability to predict the future of a structured time-varying signal must maintain a memory of the recent past. If the signal has a characteristic timescale relevant to future prediction, the memory can be a simple shift register---a moving window extending into the past, requiring storage resources that linearly grows with the timescale to be represented. However, an independent general purpose learner cannot a priori know the characteristic prediction-relevant timescale of the signal. Moreover, many naturally occurring signals show scale-free long range correlations implying that the natural prediction-relevant timescale is essentially unbounded. Hence the learner should maintain information from the longest possible timescale allowed by resource availability. Here we construct a fuzzy memory system that optimally sacrifices the temporal accuracy of information in a scale-free fashion in order to represent prediction-relevant information from exponentially long timescales. Using several illustrative examples, we demonstrate the advantage of the fuzzy memory system over a shift register in time series forecasting of natural signals. When the available storage resources are limited, we suggest that a general purpose learner would be better off committing to such a fuzzy memory system.
Distributed Representations of Words and Phrases and their Compositionality
Mikolov, Tomas, Sutskever, Ilya, Chen, Kai, Corrado, Greg, Dean, Jeffrey
The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.
Deep Multiple Kernel Learning
Strobl, Eric, Visweswaran, Shyam
Deep learning methods construct new features by transforming the input data through multiple layers of nonlinear processing. This has conventionally been accomplished by training a large artificial neural network with several hidden layers. However, the method has been limited to datasets with very large sample sizes such as the MNIST dataset which contains 60,000 training samples. More recently, there has been a drive to apply deep learning to datasets with more limited sample sizes as typical in many real-world situations. Kernel methods have been particularly successful on a variety of sample sizes because they can enable a classifier to learn a complex decision boundary with only a few parameters by projecting the data onto a high-dimensional reproducing kernel Hilbert space.
Understanding Boltzmann Machine and Deep Learning via A Confident Information First Principle
Zhao, Xiaozhao, Hou, Yuexian, Yu, Qian, Song, Dawei, Li, Wenjie
Typical dimensionality reduction methods focus on directly reducing the number of random variables while retaining maximal variations in the data. In this paper, we consider the dimensionality reduction in parameter spaces of binary multivariate distributions. We propose a general Confident-Information-First (CIF) principle to maximally preserve parameters with confident estimates and rule out unreliable or noisy parameters. Formally, the confidence of a parameter can be assessed by its Fisher information, which establishes a connection with the inverse variance of any unbiased estimate for the parameter via the Cram\'{e}r-Rao bound. We then revisit Boltzmann machines (BM) and theoretically show that both single-layer BM without hidden units (SBM) and restricted BM (RBM) can be solidly derived using the CIF principle. This can not only help us uncover and formalize the essential parts of the target density that SBM and RBM capture, but also suggest that the deep neural network consisting of several layers of RBM can be seen as the layer-wise application of CIF. Guided by the theoretical analysis, we develop a sample-specific CIF-based contrastive divergence (CD-CIF) algorithm for SBM and a CIF-based iterative projection procedure (IP) for RBM. Both CD-CIF and IP are studied in a series of density estimation experiments.
Discriminative Features via Generalized Eigenvectors
Karampatziakis, Nikos, Mineiro, Paul
Representing examples in a way that is compatible with the underlying classifier can greatly enhance the performance of a learning system. In this paper we investigate scalable techniques for inducing discriminative features by taking advantage of simple second order structure in the data. We focus on multiclass classification and show that features extracted from the generalized eigenvectors of the class conditional second moments lead to classifiers with excellent empirical performance. Moreover, these features have attractive theoretical properties, such as inducing representations that are invariant to linear transformations of the input. We evaluate classifiers built from these features on three different tasks, obtaining state of the art results.
Modeling Documents with Deep Boltzmann Machines
Srivastava, Nitish, Salakhutdinov, Ruslan R, Hinton, Geoffrey E.
We introduce a Deep Boltzmann Machine model suitable for modeling and extracting latent semantic representations from a large unstructured collection of documents. We overcome the apparent difficulty of training a DBM with judicious parameter tying. This parameter tying enables an efficient pretraining algorithm and a state initialization scheme that aids inference. The model can be trained just as efficiently as a standard Restricted Boltzmann Machine. Our experiments show that the model assigns better log probability to unseen data than the Replicated Softmax model. Features extracted from our model outperform LDA, Replicated Softmax, and DocNADE models on document retrieval and document classification tasks.