AITopics | Deep Learning

Collaborating Authors

Deep Learning

New computational algorithms make it possible to build neural networks with many input nodes and many layers, and distinguish "deep learning" of these networks from previous work on artificial neural nets.

News Overviews Instructional Materials AI-Alerts Classics

Surpassing Human-Level Face Verification Performance on LFW with GaussianFace

Lu, Chaochao, Tang, Xiaoou

arXiv.org Machine LearningDec-19-2014

Face verification remains a challenging problem in very complex conditions with large variations such as pose, illumination, expression, and occlusions. This problem is exacerbated when we rely unrealistically on a single training data source, which is often insufficient to cover the intrinsically complex face variations. This paper proposes a principled multi-task learning approach based on Discriminative Gaussian Process Latent Variable Model, named GaussianFace, to enrich the diversity of training data. In comparison to existing methods, our model exploits additional data from multiple source-domains to improve the generalization performance of face verification in an unknown target-domain. Importantly, our model can adapt automatically to complex data distributions, and therefore can well capture complex face variations inherent in multiple sources. Extensive experiments demonstrate the effectiveness of the proposed model in learning from diverse data sources and generalize to unseen domain. Specifically, the accuracy of our algorithm achieves an impressive accuracy rate of 98.52% on the well-known and challenging Labeled Faces in the Wild (LFW) benchmark [23]. For the first time, the human-level performance in face verification (97.53%) [28] on LFW is surpassed.

artificial intelligence, face verification, machine learning, (17 more...)

arXiv.org Machine Learning

1404.384

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
(2 more...)

Add feedback

Learning with Pseudo-Ensembles

Bachman, Philip, Alsharif, Ouais, Precup, Doina

arXiv.org Machine LearningDec-15-2014

We formalize the notion of a pseudo-ensemble, a (possibly infinite) collection of child models spawned from a parent model by perturbing it according to some noise process. E.g., dropout [9] in a deep neural network trains a pseudo-ensemble of child subnetworks generated by randomly masking nodes in the parent network. We examine the relationship of pseudo-ensembles, which involve perturbation in model-space, to standard ensemble methods and existing notions of robustness, which focus on perturbation in observation-space. We present a novel regularizer based on making the behavior of a pseudo-ensemble robust with respect to the noise process generating it. In the fully-supervised setting, our regularizer matches the performance of dropout. But, unlike dropout, our regularizer naturally extends to the semi-supervised setting, where it produces state-of-the-art results. We provide a case study in which we transform the Recursive Neural Tensor Network of [19] into a pseudo-ensemble, which significantly improves its performance on a real-world sentiment analysis benchmark.

artificial intelligence, machine learning, regularization, (18 more...)

arXiv.org Machine Learning

1412.4864

Country: North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Bach in 2014: Music Composition with Recurrent Neural Network

Liu, I-Ting, Ramakrishnan, Bhiksha

arXiv.org Artificial IntelligenceDec-13-2014

We propose a framework for computer music composition that uses resilient propagation (RProp) and long short term memory (LSTM) recurrent neural network. In this paper, we show that LSTM network learns the structure and characteristics of music pieces properly by demonstrating its ability to recreate music. We also show that predicting existing music using RProp outperforms Back propagation through time (BPTT).

artificial intelligence, machine learning, neural network, (18 more...)

arXiv.org Artificial Intelligence

1412.3191

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.15)
Asia > Middle East > Jordan (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.64)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Feature Weight Tuning for Recursive Neural Networks

Li, Jiwei

arXiv.org Artificial IntelligenceDec-12-2014

This paper addresses how a recursive neural network model can automatically leave out useless information and emphasize important evidence, in other words, to perform "weight tuning" for higher-level representation acquisition. We propose two models, Weighted Neural Network (WNN) and Binary-Expectation Neural Network (BENN), which automatically control how much one specific unit contributes to the higher-level representation. The proposed model can be viewed as incorporating a more powerful compositional function for embedding acquisition in recursive neural networks. Experimental results demonstrate the significant improvement over standard neural models.

artificial intelligence, machine learning, representation, (18 more...)

arXiv.org Artificial Intelligence

1412.3714

Country:

North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
Asia > Russia (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Transportation > Air (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Score Function Features for Discriminative Learning: Matrix and Tensor Framework

Janzamin, Majid, Sedghi, Hanie, Anandkumar, Anima

arXiv.org Machine LearningDec-11-2014

Feature learning forms the cornerstone for tackling challenging learning problems in domains such as speech, computer vision and natural language processing. In this paper, we consider a novel class of matrix and tensor-valued features, which can be pre-trained using unlabeled samples. We present efficient algorithms for extracting discriminative information, given these pre-trained features and labeled samples for any related task. Our class of features are based on higher-order score functions, which capture local variations in the probability density function of the input. We establish a theoretical framework to characterize the nature of discriminative information that can be extracted from score-function features, when used in conjunction with labeled samples. We employ efficient spectral decomposition algorithms (on matrices and tensors) for extracting discriminative components. The advantage of employing tensor-valued features is that we can extract richer discriminative information in the form of an overcomplete representations. Thus, we present a novel framework for employing generative models of the input for discriminative learning.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

1412.2863

Country:

North America > United States > California > Orange County > Irvine (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report (0.50)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.68)
(2 more...)

Add feedback

Deep Multi-Instance Transfer Learning

Kotzias, Dimitrios, Denil, Misha, Blunsom, Phil, de Freitas, Nando

arXiv.org Machine LearningDec-10-2014

We present a new approach for transferring knowledge from groups to individuals that comprise them. We evaluate our method in text, by inferring the ratings of individual sentences using full-review ratings. This approach, which combines ideas from transfer learning, deep learning and multi-instance learning, reduces the need for laborious human labelling of fine-grained data when abundant labels are available at the group level.

artificial intelligence, machine learning, sentiment, (15 more...)

arXiv.org Machine Learning

1411.3128

Country:

North America > United States > California (0.28)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment (0.95)
Media > Film (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Iterative Neural Autoregressive Distribution Estimator (NADE-k)

Raiko, Tapani, Yao, Li, Cho, Kyunghyun, Bengio, Yoshua

arXiv.org Machine LearningDec-5-2014

Training of the neural autoregressive density estimator (NADE) can be viewed as doing one step of probabilistic inference on missing values in data. We propose a new model that extends this inference scheme to multiple steps, arguing that it is easier to learn to improve a reconstruction in $k$ steps rather than to learn to reconstruct in a single inference step. The proposed model is an unsupervised building block for deep learning that combines the desirable properties of NADE and multi-predictive training: (1) Its test likelihood can be computed analytically, (2) it is easy to generate independent samples from it, and (3) it uses an inference engine that is a superset of variational inference for Boltzmann machines. The proposed NADE-k is competitive with the state-of-the-art in density estimation on the two datasets tested.

artificial intelligence, deep learning, machine learning, (13 more...)

arXiv.org Machine Learning

1406.1485

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.51)

Add feedback

End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results

Chorowski, Jan, Bahdanau, Dzmitry, Cho, Kyunghyun, Bengio, Yoshua

arXiv.org Machine LearningDec-4-2014

Dzmitry Bahdanau Jacobs University Bremen, Germany Yoshua Bengio Université de Montréal CIFAR Senior Fellow We replace the Hidden Markov Model (HMM) which is traditionally used in in continuous speech recognition with a bidirectional recurrent neural network encoder coupled to a recurrent neural network decoder that directly emits a stream of phonemes. The alignment between the input and output sequences is established using an attention mechanism: the decoder emits each symbol based on a context created with a subset of input symbols selected by the attention mechanism. We report initial results demonstrating that this new approach achieves phoneme error rates that are comparable to the state-of-the-art HMM-based decoders, on the TIMIT dataset.

artificial intelligence, machine learning, sequence, (18 more...)

arXiv.org Machine Learning

1412.1602

Country:

North America > Canada > Quebec > Montreal (0.24)
Europe > Germany > Bremen > Bremen (0.24)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.89)

Add feedback

Visual Sentiment Prediction with Deep Convolutional Neural Networks

Xu, Can, Cetintas, Suleyman, Lee, Kuang-Chih, Li, Li-Jia

arXiv.org Machine LearningNov-20-2014

Images have become one of the most popular types of media through which users convey their emotions within online social networks. Although vast amount of research is devoted to sentiment analysis of textual data, there has been very limited work that focuses on analyzing sentiment of image data. In this work, we propose a novel visual sentiment prediction framework that performs image understanding with Deep Convolutional Neural Networks (CNN). Specifically, the proposed sentiment prediction framework performs transfer learning from a CNN with millions of parameters, which is pre-trained on large-scale data for object recognition. Experiments conducted on two real-world datasets from Twitter and Tumblr demonstrate the effectiveness of the proposed visual sentiment analysis framework.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Machine Learning

1411.5731

Country: North America > United States > California (0.28)

Genre: Research Report (1.00)

Industry: Information Technology > Services (0.94)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Deep Unfolding: Model-Based Inspiration of Novel Deep Architectures

Hershey, John R., Roux, Jonathan Le, Weninger, Felix

arXiv.org Machine LearningNov-19-2014

Model-based methods and deep neural networks have both been tremendously successful paradigms in machine learning. In model-based methods, problem domain knowledge can be built into the constraints of the model, typically at the expense of difficulties during inference. In contrast, deterministic deep neural networks are constructed in such a way that inference is straightforward, but their architectures are generic and it is unclear how to incorporate knowledge. This work aims to obtain the advantages of both approaches. To do so, we start with a model-based approach and an associated inference algorithm, and \emph{unfold} the inference iterations as layers in a deep network. Rather than optimizing the original model, we \emph{untie} the model parameters across layers, in order to create a more powerful network. The resulting architecture can be trained discriminatively to perform accurate inference within a fixed network size. We show how this framework allows us to interpret conventional networks as mean-field inference in Markov random fields, and to obtain new architectures by instead using belief propagation as the inference algorithm. We then show its application to a non-negative matrix factorization model that incorporates the problem-domain knowledge that sound sources are additive. Deep unfolding of this model yields a new kind of non-negative deep neural network, that can be trained using a multiplicative backpropagation-style update algorithm. We present speech enhancement experiments showing that our approach is competitive with conventional neural networks despite using far fewer parameters.

algorithm, deep learning, neural network, (20 more...)

arXiv.org Machine Learning

1409.2574

Country:

North America > United States > California (0.28)
Europe > Germany (0.14)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.88)

Add feedback