AITopics | Deep Learning

Collaborating Authors

Deep Learning

New computational algorithms make it possible to build neural networks with many input nodes and many layers, and distinguish "deep learning" of these networks from previous work on artificial neural nets.

News Overviews Instructional Materials AI-Alerts Classics

Faster Asynchronous SGD

Odena, Augustus

arXiv.org Machine LearningJan-15-2016

Asynchronous distributed stochastic gradient descent methods have trouble converging because of stale gradients. A gradient update sent to a parameter server by a client is stale if the parameters used to calculate that gradient have since been updated on the server. Approaches have been proposed to circumvent this problem that quantify staleness in terms of the number of elapsed updates. In this work, we propose a novel method that quantifies staleness in terms of moving averages of gradient statistics. We show that this method outperforms previous methods with respect to convergence speed and scalability to many clients. We also discuss how an extension to this method can be used to dramatically reduce bandwidth costs in a distributed training context. In particular, our method allows reduction of total bandwidth usage by a factor of 5 with little impact on cost convergence. We also describe (and link to) a software library that we have used to simulate these algorithms deterministically on a single machine.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Machine Learning

1601.04033

Genre: Research Report (0.71)

Industry: Education > Educational Technology > Educational Software > Computer Based Training (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Deep Learning of Part-based Representation of Data Using Sparse Autoencoders with Nonnegativity Constraints

Hosseini-Asl, Ehsan, Zurada, Jacek M., Nasraoui, Olfa

arXiv.org Machine LearningJan-12-2016

We demonstrate a new deep learning autoencoder network, trained by a nonnegativity constraint algorithm (NCAE), that learns features which show part-based representation of data. The learning algorithm is based on constraining negative weights. The performance of the algorithm is assessed based on decomposing data into parts and its prediction performance is tested on three standard image data sets and one text dataset. The results indicate that the nonnegativity constraint forces the autoencoder to learn features that amount to a part-based representation of data, while improving sparsity and reconstruction quality in comparison with the traditional sparse autoencoder and Nonnegative Matrix Factorization. It is also shown that this newly acquired representation improves the prediction performance of a deep neural network.

artificial intelligence, machine learning, representation, (18 more...)

arXiv.org Machine Learning

doi: 10.1109/TNNLS.2015.2479223

1601.02733

Country: North America > United States > Kentucky > Jefferson County > Louisville (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Education > Educational Setting (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Highway Long Short-Term Memory RNNs for Distant Speech Recognition

Zhang, Yu, Chen, Guoguo, Yu, Dong, Yao, Kaisheng, Khudanpur, Sanjeev, Glass, James

arXiv.org Artificial IntelligenceJan-11-2016

ABSTRACT In this paper, we extend the deep long short-term memory (DL-STM) recurrent neural networks by introducing gated direct connections between memory cells in adjacent layers. These direct links, called highway connections, enable unimpeded information flow across different layers and thus alleviate the gradient vanishing problem when building deeper LSTMs. We further introduce the latency-controlled bidirectional LSTMs (BLSTMs) which can exploit the whole history while keeping the latency under control. Efficient algorithms are proposed to train these novel networks using both frame and sequence discriminative criteria. Experiments on the AMI distant speech recognition (DSR) task indicate that we can train deeper LSTMs and achieve better improvement from sequence training with highway LSTMs (HLSTMs). It beats the strong DNN and DLSTM baselines with 15. 7% and 5. 3% relative improvement respectively. Index Terms -- Highway LSTM, CNTK, LSTM, Sequence Training 1. INTRODUCTION Recently the deep neural network (DNN)-based acoustic models (AMs) greatly improved automatic speech recognition (ASR) accuracy on many tasks [1, 2, 3, 4].

artificial intelligence, highway connection, machine learning, (15 more...)

arXiv.org Artificial Intelligence

1510.08983

Genre: Research Report > New Finding (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Bayesian Optimization in a Billion Dimensions via Random Embeddings

Wang, Ziyu, Hutter, Frank, Zoghi, Masrour, Matheson, David, de Freitas, Nando

arXiv.org Machine LearningJan-10-2016

Bayesian optimization techniques have been successfully applied to robotics, planning, sensor placement, recommendation, advertising, intelligent user interfaces and automatic algorithm configuration. Despite these successes, the approach is restricted to problems of moderate dimension, and several workshops on Bayesian optimization have identified its scaling to high-dimensions as one of the holy grails of the field. In this paper, we introduce a novel random embedding idea to attack this problem. The resulting Random EMbedding Bayesian Optimization (REMBO) algorithm is very simple, has important invariance properties, and applies to domains with both categorical and continuous variables. We present a thorough theoretical analysis of REMBO. Empirical results confirm that REMBO can effectively solve problems with billions of dimensions, provided the intrinsic dimensionality is low. They also show that REMBO achieves state-of-the-art performance in optimizing the 47 discrete parameters of a popular mixed integer linear programming solver.

artificial intelligence, machine learning, optimization, (19 more...)

arXiv.org Machine Learning

1301.1942

Country:

North America > Canada (0.28)
Europe (0.28)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

What to talk about and how? Selective Generation using LSTMs with Coarse-to-Fine Alignment

Mei, Hongyuan, Bansal, Mohit, Walter, Matthew R.

arXiv.org Artificial IntelligenceJan-8-2016

We propose an end-to-end, domain-independent neural encoder-aligner-decoder model for selective generation, i.e., the joint task of content selection and surface realization. Our model first encodes a full set of over-determined database event records via an LSTM-based recurrent neural network, then utilizes a novel coarse-to-fine aligner to identify the small subset of salient records to talk about, and finally employs a decoder to generate free-form descriptions of the aligned, selected records. Our model achieves the best selection and generation results reported to-date (with 59% relative improvement in generation) on the benchmark WeatherGov dataset, despite using no specialized features or linguistic resources. Using an improved k-nearest neighbor beam filter helps further. We also perform a series of ablations and visualizations to elucidate the contributions of our key model components. Lastly, we evaluate the generalizability of our model on the RoboCup dataset, and get results that are competitive with or better than the state-of-the-art, despite being severely data-starved.

artificial intelligence, machine learning, proceedings, (16 more...)

arXiv.org Artificial Intelligence

1509.00838

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Sports > Soccer (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Dropout as data augmentation

Bouthillier, Xavier, Konda, Kishore, Vincent, Pascal, Memisevic, Roland

arXiv.org Machine LearningJan-7-2016

Dropout is typically interpreted as bagging a large number of models sharing parameters. We show that using dropout in a network can also be interpreted as a kind of data augmentation in the input space without domain knowledge. We present an approach to projecting the dropout noise within a network back into the input space, thereby generating augmented versions of the training data, and we show that training a deterministic network on the augmented samples yields similar results. Finally, we propose a new dropout noise scheme based on our observations and show that it improves dropout results without adding significant computational cost.

artificial intelligence, dropout, machine learning, (16 more...)

arXiv.org Machine Learning

1506.087

Country:

North America (0.29)
Europe > Germany (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism

Firat, Orhan, Cho, Kyunghyun, Bengio, Yoshua

arXiv.org Machine LearningJan-5-2016

We propose multi-way, multilingual neural machine translation. The proposed approach enables a single neural translation model to translate between multiple languages, with a number of parameters that grows only linearly with the number of languages. This is made possible by having a single attention mechanism that is shared across all language pairs. We train the proposed multi-way, multilingual model on ten language pairs from WMT'15 simultaneously and observe clear performance improvements over models trained on only one language pair. In particular, we observe that the proposed model significantly improves the translation quality of low-resource language pairs.

machine learning, natural language, translation, (16 more...)

arXiv.org Machine Learning

1601.01073

Country: North America > Canada (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Semi-supervised Tuning from Temporal Coherence

Maltoni, Davide, Lomonaco, Vincenzo

arXiv.org Machine LearningJan-4-2016

Recent works demonstrated the usefulness of temporal coherence to regularize supervised training or to learn invariant features with deep architectures. In particular, enforcing smooth output changes while presenting temporally-closed frames from video sequences, proved to be an effective strategy. In this paper we prove the efficacy of temporal coherence for semi-supervised incremental tuning. We show that a deep architecture, just mildly trained in a supervised manner, can progressively improve its classification accuracy, if exposed to video sequences of unlabeled data. The extent to which, in some cases, a semi-supervised tuning allows to improve classification accuracy (approaching the supervised one) is somewhat surprising. A number of control experiments pointed out the fundamental role of temporal coherence.

accuracy, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

1511.03163

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.93)

Industry: Education (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Recursive Training of 2D-3D Convolutional Networks for Neuronal Boundary Prediction

Lee, Kisuk, Zlateski, Aleksandar, Ashwin, Vishwanathan, Seung, H. Sebastian

Neural Information Processing SystemsDec-31-2015

Efforts to automate the reconstruction of neural circuits from 3D electron microscopic (EM) brain images are critical for the field of connectomics. An important computation for reconstruction is the detection of neuronal boundaries. Images acquired by serial section EM, a leading 3D EM technique, are highly anisotropic, with inferior quality along the third dimension. For such images, the 2D max-pooling convolutional network has set the standard for performance at boundary detection. Here we achieve a substantial gain in accuracy through three innovations. Following the trend towards deeper networks for object recognition, we use a much deeper network than previously employed for boundary detection. Second, we incorporate 3D as well as 2D filters, to enable computations that use 3D context. Finally, we adopt a recursively trained architecture in which a first network generates a preliminary boundary map that is provided as input along with the original image to a second network that generates a final boundary map. Backpropagation training is accelerated by ZNN, a new implementation of 3D convolutional networks that uses multicore CPU parallelism for speed. Our hybrid 2D-3D architecture could be more generally applicable to other types of anisotropic 3D images, including video, and our recursive framework for any image labeling problem.

convnet, segmentation, vd2d3d, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > New York (0.04)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Learning Structured Output Representation using Deep Conditional Generative Models

Sohn, Kihyuk, Lee, Honglak, Yan, Xinchen

Neural Information Processing SystemsDec-31-2015

Supervised deep learning has been successfully applied for many recognition problems in machine learning and computer vision. Although it can approximate a complex many-to-one function very well when large number of training data is provided, the lack of probabilistic inference of the current supervised deep learning methods makes it difficult to model a complex structured output representations. In this work, we develop a scalable deep conditional generative model for structured output variables using Gaussian latent variables. The model is trained efficiently in the framework of stochastic gradient variational Bayes, and allows a fast prediction using stochastic feed-forward inference. In addition, we provide novel strategies to build a robust structured prediction algorithms, such as recurrent prediction network architecture, input noise-injection and multi-scale prediction training methods. In experiments, we demonstrate the effectiveness of our proposed algorithm in comparison to the deterministic deep neural network counterparts in generating diverse but realistic output representations using stochastic inference. Furthermore, the proposed schemes in training methods and architecture design were complimentary, which leads to achieve strong pixel-level object segmentation and semantic labeling performance on Caltech-UCSD Birds 200 and the subset of Labeled Faces in the Wild dataset.

cvae, latent variable, prediction, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
North America > United States > California (0.04)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback