Deep Learning
The risk-taker pushing Intel into the new world of artificial intelligence
To get a sense of computer scientist Naveen Rao, just take a look at his hands. The 42-year-old has busted all 10 of his fingers over a lifetime of skiing, skateboarding, bicycling, rollerblading, race-car driving, wrestling and hoops. He's not a clod; he's a risk taker who pushes physical and mental boundaries. On the mental side, he's trying to quicken the computer industry's move into a new age of artificial intelligence by creating chips and software inspired by the structure of behavior of the computer brain. What sets Rao apart from others attempting the same thing is the fact that Intel last year bought his San Diego company, Nervana, for $400 million.
Credit Assignment in Deep Learning - Tim Dettmers
This morning I got an email about my blog post discussing the history of deep learning which rattled me back into a time of my academic career which I rather not think about. It was a low point which nearly ended my Master studies at the University of Lugano, and it made me feel so bad about blogging that I took two long years to recover. When I started my masters, I worked on blog posts for NVIDIA which featured introductions into deep learning. I hence discussed what I thought to be the historical milestones with the largest impact but in doing so, I inadvertently assigned credit to researchers that I thought had a good impact on the field. I worked on this blog post and circulated it in my deep learning class's forums to the dismay of my then advisor who holds the opposite view of mine.
Putting the "Science" Back in Data Science
If you start your data analysis by simply stating hypotheses and applying Machine Learning algorithms, this is the wrong way. In a few words, I studied the past 27 years of Business Management literature and I tried to develop an epistemologically disruptive approach to measure and predict service quality, mixing Business Administration with Electrical Engineering concepts. Ah, profit call be predicted using Deep Neural Networks using data from Market Research, Financial Data and word embeddings from Social Media as features! So, the SCIENCE in Data Science is not only about Machine Learning, Deep Learning, Natural Language Processing, A.I.
Gated Graph Sequence Neural Networks
Li, Yujia, Tarlow, Daniel, Brockschmidt, Marc, Zemel, Richard
Graph-structured data appears frequently in domains including chemistry, natural language semantics, social networks, and knowledge bases. In this work, we study feature learning techniques for graph-structured inputs. Our starting point is previous work on Graph Neural Networks (Scarselli et al., 2009), which we modify to use gated recurrent units and modern optimization techniques and then extend to output sequences. The result is a flexible and broadly useful class of neural network models that has favorable inductive biases relative to purely sequence-based models (e.g., LSTMs) when the problem is graph-structured. We demonstrate the capabilities on some simple AI (bAbI) and graph algorithm learning tasks. We then show it achieves state-of-the-art performance on a problem from program verification, in which subgraphs need to be described as abstract data structures.
MR Acquisition-Invariant Representation Learning
Kouw, Wouter M., Loog, Marco, Bartels, Lambertus W., Mendrik, Adriรซnne M.
Voxelwise classification is a popular and effective method for tissue quantification in brain magnetic resonance imaging (MRI) scans. However, there are often large differences over sets of MRI scans due to how they were acquired (i.e. field strength, vendor, protocol), that lead to variation in, among others, pixel intensities, tissue contrast, signal-to-noise ratio, resolution, slice thickness and magnetic field inhomogeneities. Classifiers trained on data from a specific scanner fail or under-perform when applied to data that was differently acquired. In order to address this lack of generalization, we propose a Siamese neural network (MRAI-net) to learn a representation that minimizes the between-scanner variation, while maintaining the contrast between brain tissues necessary for brain tissue quantification. The proposed MRAI-net was evaluated on both simulated and real MRI data. After learning the MR acquisition invariant representation, any supervised classifier can be applied. In this paper we showed that applying a linear classifier on the MRAI representation outperforms supervised convolutional neural network classifiers for tissue classification when little target training data is available.
Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data
Hsu, Wei-Ning, Zhang, Yu, Glass, James
We present a factorized hierarchical variational autoencoder, which learns disentangled and interpretable representations from sequential data without supervision. Specifically, we exploit the multi-scale nature of information in sequential data by formulating it explicitly within a factorized hierarchical graphical model that imposes sequence-dependent priors and sequence-independent priors to different sets of latent variables. The model is evaluated on two speech corpora to demonstrate, qualitatively, its ability to transform speakers or linguistic content by manipulating different sets of latent variables; and quantitatively, its ability to outperform an i-vector baseline for speaker verification and reduce the word error rate by as much as 35% in mismatched train/test scenarios for automatic speech recognition tasks.
Total stability of kernel methods
Christmann, Andreas, Xiang, Daohong, Zhou, Ding-Xuan
Regularized empirical risk minimization using kernels and their corresponding reproducing kernel Hilbert spaces (RKHSs) plays an important role in machine learning. However, the actually used kernel often depends on one or on a few hyperparameters or the kernel is even data dependent in a much more complicated manner. Examples are Gaussian RBF kernels, kernel learning, and hierarchical Gaussian kernels which were recently proposed for deep learning. Therefore, the actually used kernel is often computed by a grid search or in an iterative manner and can often only be considered as an approximation to the "ideal" or "optimal" kernel. The paper gives conditions under which classical kernel based methods based on a convex Lipschitz loss function and on a bounded and smooth kernel are stable, if the probability measure $P$, the regularization parameter $\lambda$, and the kernel $k$ may slightly change in a simultaneous manner. Similar results are also given for pairwise learning. Therefore, the topic of this paper is somewhat more general than in classical robust statistics, where usually only the influence of small perturbations of the probability measure $P$ on the estimated function is considered.
Neural Optimizer Search with Reinforcement Learning
Bello, Irwan, Zoph, Barret, Vasudevan, Vijay, Le, Quoc V.
We present an approach to automate the process of discovering optimization methods, with a focus on deep learning architectures. We train a Recurrent Neural Network controller to generate a string in a domain specific language that describes a mathematical update equation based on a list of primitive functions, such as the gradient, running average of the gradient, etc. The controller is trained with Reinforcement Learning to maximize the performance of a model after a few epochs. On CIFAR-10, our method discovers several update rules that are better than many commonly used optimizers, such as Adam, RM-SProp, or SGD with and without Momentum on a ConvNet model. We introduce two new optimizers, named PowerSign and AddSign, which we show transfer well and improve training on a variety of different tasks and architectures, including ImageNet classification and Google's neural machine translation system.
Deconvolutional Paragraph Representation Learning
Zhang, Yizhe, Shen, Dinghan, Wang, Guoyin, Gan, Zhe, Henao, Ricardo, Carin, Lawrence
Learning latent representations from long text sequences is an important first step in many natural language processing applications. Recurrent Neural Networks (RNNs) have become a cornerstone for this challenging task. However, the quality of sentences during RNN-based decoding (reconstruction) decreases with the length of the text. We propose a sequence-to-sequence, purely convolutional and deconvolutional autoencoding framework that is free of the above issue, while also being computationally efficient. The proposed method is simple, easy to implement and can be leveraged as a building block for many applications. We show empirically that compared to RNNs, our framework is better at reconstructing and correcting long paragraphs. Quantitative evaluation on semi-supervised text classification and summarization tasks demonstrate the potential for better utilization of long unlabeled text data.