AITopics | Deep Learning

Collaborating Authors

Deep Learning

New computational algorithms make it possible to build neural networks with many input nodes and many layers, and distinguish "deep learning" of these networks from previous work on artificial neural nets.

News Overviews Instructional Materials AI-Alerts Classics

On the Inductive Bias of Dropout

Helmbold, David P., Long, Philip M.

arXiv.org Artificial IntelligenceFeb-17-2015

Dropout is a simple but effective technique for learning in neural networks and other settings. A sound theoretical understanding of dropout is needed to determine when dropout should be applied and how to use it most effectively. In this paper we continue the exploration of dropout as a regularizer pioneered by Wager, et.al. We focus on linear classification where a convex proxy to the misclassification loss (i.e. the logistic loss used in logistic regression) is minimized. We show: (a) when the dropout-regularized criterion has a unique minimizer, (b) when the dropout-regularization penalty goes to infinity with the weights, and when it remains bounded, (c) that the dropout regularization can be non-monotonic as individual weights increase from 0, and (d) that the dropout regularization penalty may not be convex. This last point is particularly surprising because the combination of dropout regularization with any convex loss proxy is always a convex function. In order to contrast dropout regularization with $L_2$ regularization, we formalize the notion of when different sources are more compatible with different regularizers. We then exhibit distributions that are provably more compatible with dropout regularization than $L_2$ regularization, and vice versa. These sources provide additional insight into how the inductive biases of dropout and $L_2$ regularization differ. We provide some similar results for $L_1$ regularization.

artificial intelligence, exp, machine learning, (18 more...)

arXiv.org Artificial Intelligence

1412.4736

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Spatial Diffuseness Features for DNN-Based Speech Recognition in Noisy and Reverberant Environments

Schwarz, Andreas, Huemmer, Christian, Maas, Roland, Kellermann, Walter

arXiv.org Machine LearningFeb-16-2015

ABSTRACT We propose a spatial diffuseness feature for deep neural network (DNN)-based automatic speech recognition to improve recognition accuracy in reverberant and noisy environments. The feature is computed in real-time from multiple microphone signals without requiring knowledge or estimation of the direction of arrival, and represents the relative amount of diffuse noise in each time and frequency bin. It is shown that using the diffuseness feature as an additional input to a DNN-based acoustic model leads to a reduced word error rate for the REVERB challenge corpus, both compared to logmelspec features extracted from noisy signals, and features enhanced by spectral subtraction. Index Terms-- Speech Recognition, Reverberation, Diffuse Noise, Deep Neural Networks 1. INTRODUCTION In automatic speech recognizers (ASR) based on Gaussian Mixture Models and Hidden Markov Models (GMM-HMM), a wide variety of transformations and feature extraction steps is currently being employed with the aim of extracting and normalizing the information contained in the time-domain input signal as efficiently as possible. Recently, with the development of effective training methods for acoustic models based on multiple-layer neural networks, which are often summarized under the term "deep neural networks" (DNN) [1], it has become possible for the acoustic model to learn relationships between features and phonemes to a higher degree than it is possible with manually implemented feature transformation steps.

artificial intelligence, diffuseness, machine learning, (18 more...)

arXiv.org Machine Learning

doi: 10.1109/ICASSP.2015.7178798

1410.2479

Country:

Europe > Italy (0.16)
Europe > France (0.15)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.88)

Add feedback

Predicting Alzheimer's disease: a neuroimaging study with 3D convolutional neural networks

Payan, Adrien, Montana, Giovanni

arXiv.org Machine LearningFeb-9-2015

Alzheimer's Disease (AD) is the most common type of dementia. Dementia refers to diseases that are characterized by a loss of memory or other cognitive impairments, and is caused by damage to nerve cells in the brain. In the United States, an estimated 5.2 million people of all ages have AD in 2014. Mild cognitive impairment (MCI) is a condition in which an individual has mild but noticeable changes in thinking abilities. Individuals with MCI are more likely to develop AD than inviduals without [1]. Early detection of the disease can be achieved by magnetic resonance imaging (MRI), a technique that uses a magnetic field and radio waves to create a detailed 3D image of the brain.

artificial intelligence, autoencoder, machine learning, (19 more...)

arXiv.org Machine Learning

1502.02506

Country: North America > United States (0.34)

Genre: Research Report > New Finding (0.47)

Industry: Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Generative Moment Matching Networks

Li, Yujia, Swersky, Kevin, Zemel, Richard

arXiv.org Machine LearningFeb-9-2015

We consider the problem of learning deep generative models from data. We formulate a method that generates an independent sample via a single feedforward pass through a multilayer perceptron, as in the recently proposed generative adversarial networks (Goodfellow et al., 2014). Training a generative adversarial network, however, requires careful optimization of a difficult minimax program. Instead, we utilize a technique from statistical hypothesis testing known as maximum mean discrepancy (MMD), which leads to a simple objective that can be interpreted as matching all orders of statistics between a dataset and samples from the model, and can be trained by backpropagation. We further boost the performance of this approach by combining our generative network with an auto-encoder network, using MMD to learn to generate codes that can then be decoded to produce samples. We show that the combination of these techniques yields excellent generative models compared to baseline approaches as measured on MNIST and the Toronto Face Database.

artificial intelligence, generative model, machine learning, (15 more...)

arXiv.org Machine Learning

1502.02761

Country: North America > Canada > Ontario > Toronto (0.36)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Deep Learning with Limited Numerical Precision

Gupta, Suyog, Agrawal, Ankur, Gopalakrishnan, Kailash, Narayanan, Pritish

arXiv.org Machine LearningFeb-9-2015

Training of large-scale deep neural networks is often constrained by the available computational resources. We study the effect of limited precision data representation and computation on neural network training. Within the context of low-precision fixed-point computations, we observe the rounding scheme to play a crucial role in determining the network's behavior during training. Our results show that deep networks can be trained using only 16-bit wide fixed-point number representation when using stochastic rounding, and incur little to no degradation in the classification accuracy. We also demonstrate an energy-efficient hardware accelerator that implements low-precision fixed-point arithmetic with stochastic rounding.

artificial intelligence, machine learning, neural network, (17 more...)

arXiv.org Machine Learning

1502.02551

Country: North America (0.46)

Genre: Research Report > New Finding (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

He, Kaiming, Zhang, Xiangyu, Ren, Shaoqing, Sun, Jian

arXiv.org Artificial IntelligenceFeb-6-2015

Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra computational cost and little overfitting risk. Second, we derive a robust initialization method that particularly considers the rectifier nonlinearities. This method enables us to train extremely deep rectified models directly from scratch and to investigate deeper or wider network architectures. Based on our PReLU networks (PReLU-nets), we achieve 4.94% top-5 test error on the ImageNet 2012 classification dataset. This is a 26% relative improvement over the ILSVRC 2014 winner (GoogLeNet, 6.66%). To our knowledge, our result is the first to surpass human-level performance (5.1%, Russakovsky et al.) on this visual recognition challenge.

artificial intelligence, machine learning, neural network, (19 more...)

arXiv.org Artificial Intelligence

1502.01852

Genre: Research Report > New Finding (0.67)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Confident Information First Principle for Parametric Reduction and Model Selection of Boltzmann Machines

Zhao, Xiaozhao, Hou, Yuexian, Song, Dawei, Li, Wenjie

arXiv.org Machine LearningFeb-5-2015

Typical dimensionality reduction (DR) methods are often data-oriented, focusing on directly reducing the number of random variables (features) while retaining the maximal variations in the high-dimensional data. In unsupervised situations, one of the main limitations of these methods lies in their dependency on the scale of data features. This paper aims to address the problem from a new perspective and considers model-oriented dimensionality reduction in parameter spaces of binary multivariate distributions. Specifically, we propose a general parameter reduction criterion, called Confident-Information-First (CIF) principle, to maximally preserve confident parameters and rule out less confident parameters. Formally, the confidence of each parameter can be assessed by its contribution to the expected Fisher information distance within the geometric manifold over the neighbourhood of the underlying real distribution. We then revisit Boltzmann machines (BM) from a model selection perspective and theoretically show that both the fully visible BM (VBM) and the BM with hidden units can be derived from the general binary multivariate distribution using the CIF principle. This can help us uncover and formalize the essential parts of the target density that BM aims to capture and the non-essential parts that BM should discard. Guided by the theoretical analysis, we develop a sample-specific CIF for model selection of BM that is adaptive to the observed samples. The method is studied in a series of density estimation experiments and has been shown effective in terms of the estimate accuracy.

artificial intelligence, deep learning, machine learning, (14 more...)

arXiv.org Machine Learning

1502.01705

Country:

Asia > China (0.46)
North America > Canada (0.28)
Europe > United Kingdom > England (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Add feedback

From neural PCA to deep unsupervised learning

Valpola, Harri

arXiv.org Machine LearningFeb-2-2015

A network supporting deep unsupervised learning is presented. The network is an autoencoder with lateral shortcut connections from the encoder to decoder at each level of the hierarchy. The lateral shortcut connections allow the higher levels of the hierarchy to focus on abstract invariant features. While standard autoencoders are analogous to latent variable models with a single layer of stochastic variables, the proposed network is analogous to hierarchical latent variables models. Learning combines denoising autoencoder and denoising sources separation frameworks. Each layer of the network contributes to the cost function a term which measures the distance of the representations produced by the encoder and the decoder. Since training signals originate from all levels of the network, all layers can learn efficiently even in deep networks. The speedup offered by cost terms from higher levels of the hierarchy and the ability to learn invariant features are demonstrated in experiments.

artificial intelligence, information, machine learning, (16 more...)

arXiv.org Machine Learning

1411.7783

Country:

Asia (0.67)
North America > United States (0.28)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Deep learning of fMRI big data: a novel approach to subject-transfer decoding

Koyamada, Sotetsu, Shikauchi, Yumi, Nakae, Ken, Koyama, Masanori, Ishii, Shin

arXiv.org Machine LearningJan-31-2015

As a technology to read brain states from measurable brain activities, brain decoding are widely applied in industries and medical sciences. In spite of high demands in these applications for a universal decoder that can be applied to all individuals simultaneously, large variation in brain activities across individuals has limited the scope of many studies to the development of individual-specific decoders. In this study, we used deep neural network (DNN), a nonlinear hierarchical model, to construct a subject-transfer decoder. Our decoder is the first successful DNN-based subject-transfer decoder. When applied to a large-scale functional magnetic resonance imaging (fMRI) database, our DNN-based decoder achieved higher decoding accuracy than other baseline methods, including support vector machine (SVM). In order to analyze the knowledge acquired by this decoder, we applied principal sensitivity analysis (PSA) to the decoder and visualized the discriminative features that are common to all subjects in the dataset. Our PSA successfully visualized the subject-independent features contributing to the subject-transferability of the trained decoder.

artificial intelligence, decoder, machine learning, (18 more...)

arXiv.org Machine Learning

1502.00093

Country: Asia > Japan (0.14)

Genre: Research Report > New Finding (0.67)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.55)

Add feedback

Teaching Deep Convolutional Neural Networks to Play Go

Clark, Christopher, Storkey, Amos

arXiv.org Artificial IntelligenceJan-27-2015

Mastering the game of Go has remained a long standing challenge to the field of AI. Modern computer Go systems rely on processing millions of possible future positions to play well, but intuitively a stronger and more 'humanlike' way to play the game would be to rely on pattern recognition abilities rather then brute force computation. Following this sentiment, we train deep convolutional neural networks to play Go by training them to predict the moves made by expert Go players. To solve this problem we introduce a number of novel techniques, including a method of tying weights in the network to 'hard code' symmetries that are expect to exist in the target function, and demonstrate in an ablation study they considerably improve performance. Our final networks are able to achieve move prediction accuracies of 41.1% and 44.4% on two different Go datasets, surpassing previous state of the art on this task by significant margins. Additionally, while previous move prediction programs have not yielded strong Go playing programs, we show that the networks trained in this work acquired high levels of skill. Our convolutional neural networks can consistently defeat the well known Go program GNU Go, indicating it is state of the art among programs that do not use Monte Carlo Tree Search. It is also able to win some games against state of the art Go playing program Fuego while using a fraction of the play time. This success at playing Go indicates high level principles of the game were learned.

artificial intelligence, deep learning, machine learning, (14 more...)

arXiv.org Artificial Intelligence

1412.3409

Genre: Research Report (0.84)

Industry: Leisure & Entertainment > Games > Go (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback