AITopics | Deep Learning

Collaborating Authors

Deep Learning

New computational algorithms make it possible to build neural networks with many input nodes and many layers, and distinguish "deep learning" of these networks from previous work on artificial neural nets.

News Overviews Instructional Materials AI-Alerts Classics

Deep Learning by Scattering

Mallat, Stéphane, Waldspurger, Irène

arXiv.org Machine LearningJun-25-2015

We introduce general scattering transforms as mathematical models of deep neural networks with l2 pooling. Scattering networks iteratively apply complex valued unitary operators, and the pooling is performed by a complex modulus. An expected scattering defines a contractive representation of a high-dimensional probability distribution, which preserves its mean-square norm. We show that unsupervised learning can be casted as an optimization of the space contraction to preserve the volume occupied by unlabeled examples, at each layer of the network. Supervised learning and classification are performed with an averaged scattering, which provides scattering estimations for multiple classes.

artificial intelligence, machine learning, operator, (18 more...)

arXiv.org Machine Learning

1306.5532

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Global Optimality in Tensor Factorization, Deep Learning, and Beyond

Haeffele, Benjamin D., Vidal, Rene

arXiv.org Machine LearningJun-24-2015

Techniques involving factorization are found in a wide range of applications and have enjoyed significant empirical success in many fields. However, common to a vast majority of these problems is the significant disadvantage that the associated optimization problems are typically non-convex due to a multilinear form or other convexity destroying transformation. Here we build on ideas from convex relaxations of matrix factorizations and present a very general framework which allows for the analysis of a wide range of non-convex factorization problems - including matrix factorization, tensor factorization, and deep neural network training formulations. We derive sufficient conditions to guarantee that a local minimum of the non-convex optimization problem is a global minimum and show that if the size of the factorized variables is large enough then from any initialization it is possible to find a global minimizer using a purely local descent algorithm. Our framework also provides a partial theoretical justification for the increasingly common use of Rectified Linear Units (ReLUs) in deep neural networks and offers guidance on deep network architectures and regularization strategies to facilitate efficient optimization.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Machine Learning

1506.0754

Country: North America > United States (0.46)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Attention-Based Models for Speech Recognition

Chorowski, Jan, Bahdanau, Dzmitry, Serdyuk, Dmitriy, Cho, Kyunghyun, Bengio, Yoshua

arXiv.org Machine LearningJun-24-2015

Recurrent sequence generators conditioned on input data through an attention mechanism have recently shown very good performance on a range of tasks in- cluding machine translation, handwriting synthesis and image caption gen- eration. We extend the attention-mechanism with features needed for speech recognition. We show that while an adaptation of the model used for machine translation in reaches a competitive 18.7% phoneme error rate (PER) on the TIMIT phoneme recognition task, it can only be applied to utterances which are roughly as long as the ones it was trained on. We offer a qualitative explanation of this failure and propose a novel and generic method of adding location-awareness to the attention mechanism to alleviate this issue. The new method yields a model that is robust to long inputs and achieves 18% PER in single utterances and 20% in 10-times longer (repeated) utterances. Finally, we propose a change to the at- tention mechanism that prevents it from concentrating too much on single frames, which further reduces PER to 17.6% level.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

1506.07503

Country: Europe (0.68)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.87)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.70)

Add feedback

Efficient Learning for Undirected Topic Models

Gu, Jiatao, Li, Victor O. K.

arXiv.org Machine LearningJun-24-2015

Replicated Softmax model, a well-known undirected topic model, is powerful in extracting semantic representations of documents. Traditional learning strategies such as Contrastive Divergence are very inefficient. This paper provides a novel estimator to speed up the learning based on Noise Contrastive Estimate, extended for documents of variant lengths and weighted inputs. Experiments on two benchmarks show that the new estimator achieves great learning efficiency and high accuracy on document retrieval and classification.

machine learning, natural language, rsm, (19 more...)

arXiv.org Machine Learning

1506.07477

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Parallel training of DNNs with Natural Gradient and Parameter Averaging

Povey, Daniel, Zhang, Xiaohui, Khudanpur, Sanjeev

arXiv.org Machine LearningJun-22-2015

We describe the neural-network training framework used in the Kaldi speech recognition toolkit, which is geared towards training DNNs with large amounts of training data using multiple GPU-equipped or multi-core machines. In order to be as hardware-agnostic as possible, we needed a way to use multiple machines without generating excessive network traffic. Our method is to average the neural network parameters periodically (typically every minute or two), and redistribute the averaged parameters to the machines for further training. Each machine sees different data. By itself, this method does not work very well. However, we have another method, an approximate and efficient implementation of Natural Gradient for Stochastic Gradient Descent (NG-SGD), which seems to allow our periodic-averaging method to work well, as well as substantially improving the convergence of SGD on a single machine.

artificial intelligence, machine learning, matrix, (19 more...)

arXiv.org Machine Learning

1410.7455

Country: North America > United States (0.93)

Genre: Research Report (1.00)

Industry: Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Aligning where to see and what to tell: image caption with region-based attention and scene factorization

Jin, Junqi, Fu, Kun, Cui, Runpeng, Sha, Fei, Zhang, Changshui

arXiv.org Machine LearningJun-20-2015

Recent progress on automatic generation of image captions has shown that it is possible to describe the most salient information conveyed by images with accurate and meaningful sentences. In this paper, we propose an image caption system that exploits the parallel structures between images and sentences. In our model, the process of generating the next word, given the previously generated ones, is aligned with the visual perception experience where the attention shifting among the visual regions imposes a thread of visual ordering. This alignment characterizes the flow of "abstract meaning", encoding what is semantically shared by both the visual scene and the text description. Our system also makes another novel modeling contribution by introducing scene-specific contexts that capture higher-level semantic information encoded in an image. The contexts adapt language models for word generation to specific scene types. We benchmark our system and contrast to published results on several popular datasets. We show that using either region-based attention or scene-specific contexts improves systems without those components. Furthermore, combining these two modeling ingredients attains the state-of-the-art performance.

caption, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

1506.06272

Country: North America > United States > California > Los Angeles County > Los Angeles (0.28)

Genre: Research Report (0.64)

Industry:

Government > Regional Government > North America Government > United States Government (0.46)
Government > Military (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(3 more...)

Add feedback

Soft-Deep Boltzmann Machines

Kiwaki, Taichi

arXiv.org Machine LearningJun-20-2015

We present a layered Boltzmann machine (BM) that can better exploit the advantages of a distributed representation. It is widely believed that deep BMs (DBMs) have far greater representational power than its shallow counterpart, restricted Boltzmann machines (RBMs). However, this expectation on the supremacy of DBMs over RBMs has not ever been validated in a theoretical fashion. In this paper, we provide both theoretical and empirical evidences that the representational power of DBMs can be actually rather limited in taking advantages of distributed representations. We propose an approximate measure for the representational power of a BM regarding to the efficiency of a distributed representation. With this measure, we show a surprising fact that DBMs can make inefficient use of distributed representations. Based on these observations, we propose an alternative BM architecture, which we dub soft-deep BMs (sDBMs). We show that sDBMs can more efficiently exploit the distributed representations in terms of the measure. Experiments demonstrate that sDBMs outperform several state-of-the-art models, including DBMs, in generative tasks on binarized MNIST and Caltech-101 silhouettes.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Machine Learning

1505.02462

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

From Pixels to Torques: Policy Learning with Deep Dynamical Models

Wahlström, Niklas, Schön, Thomas B., Deisenroth, Marc Peter

arXiv.org Machine LearningJun-18-2015

Data-efficient learning in continuous state-action spaces using very high-dimensional observations remains a key challenge in developing fully autonomous systems. In this paper, we consider one instance of this challenge, the pixels to torques problem, where an agent must learn a closed-loop control policy from pixel information only. We introduce a data-efficient, model-based reinforcement learning algorithm that learns such a closed-loop policy directly from pixel information. The key ingredient is a deep dynamical model that uses deep auto-encoders to learn a low-dimensional embedding of images jointly with a predictive model in this low-dimensional feature space. Joint learning ensures that not only static but also dynamic properties of the data are accounted for. This is crucial for long-term predictions, which lie at the core of the adaptive model predictive control strategy that we use for closed-loop control. Compared to state-of-the-art reinforcement learning methods for continuous states and actions, our approach learns quickly, scales to high-dimensional state spaces and is an important step toward fully autonomous learning from pixels to torques.

deep learning, neural network, upstream oil & gas, (20 more...)

arXiv.org Machine Learning

1502.02251

Country:

Europe > Sweden (0.28)
Europe > United Kingdom (0.14)

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas > Upstream (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Collaborative Deep Learning for Recommender Systems

Wang, Hao, Wang, Naiyan, Yeung, Dit-Yan

arXiv.org Machine LearningJun-18-2015

Collaborative filtering (CF) is a successful approach commonly used by many recommender systems. Conventional CF-based methods use the ratings given to items by users as the sole source of information for learning to make recommendation. However, the ratings are often very sparse in many applications, causing CF-based methods to degrade significantly in their recommendation performance. To address this sparsity problem, auxiliary information such as item content information may be utilized. Collaborative topic regression (CTR) is an appealing recent method taking this approach which tightly couples the two components that learn from two different sources of information. Nevertheless, the latent representation learned by CTR may not be very effective when the auxiliary information is very sparse. To address this problem, we generalize recent advances in deep learning from i.i.d. input to non-i.i.d. (CF-based) input and propose in this paper a hierarchical Bayesian model called collaborative deep learning (CDL), which jointly performs deep representation learning for the content information and collaborative filtering for the ratings (feedback) matrix. Extensive experiments on three real-world datasets from different domains show that CDL can significantly advance the state of the art.

artificial intelligence, machine learning, social media, (18 more...)

arXiv.org Machine Learning

1409.2944

Country: North America > United States (0.46)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment (0.94)
Information Technology (0.70)
Media > Film (0.69)
Health & Medicine (0.69)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

Add feedback

How to Scale Up Kernel Methods to Be As Good As Deep Neural Nets

Lu, Zhiyun, May, Avner, Liu, Kuan, Garakani, Alireza Bagheri, Guo, Dong, Bellet, Aurélien, Fan, Linxi, Collins, Michael, Kingsbury, Brian, Picheny, Michael, Sha, Fei

arXiv.org Machine LearningJun-17-2015

Deep neural networks (DNNs) and other types of deep learning architecture have made significant advances [3, 4]. In both well-benchmarked tasks and real-world applications, such as automatic speech recognition [21, 34, 44] and image recognition [29, 48], deep learning architectures have achieved an unprecedented level of success and have generated major impact. Arguably, the most instrumental factors contributing to their success are: (1) learning from a huge amount of training data for highly complex models with millions to billions of parameters; (2) adopting simple but effective optimization methods such as stochastic gradient descent; (3) combatting overfitting with new schemes such as dropout [23]; and (4) computing with massive parallelism on GPUs. New techniques as well as "tricks of the trade" are frequently invented and added to the toolboxes for machine learning researchers and practitioners. In stark contrast, there have been many fewer publicly known successful applications of kernel methods (such as support vector machines) to problems at a scale comparable to the speech and image recognition problems tackled by DNNs. This is a surprising chasm, noting that kernel methods have been extensively studied both theoretically and empirically for their power of modeling highly nonlinear data [43]. Moreover, the connection between kernel methods and (infinite) neural networks has also been long noted [35, 51, 11]. Nonetheless, a common misconception is that it may be difficult, if not impossible, for kernel methods to catch up with deep learning methods in addressing large-scale learning problems.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Machine Learning

1411.4

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > Canada > Ontario > Toronto (0.14)
North America > United States > New York > New York County > New York City (0.04)
(5 more...)

Genre: Research Report (0.83)

Industry: Government > Regional Government > North America Government > United States Government (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback