Deep Learning
Automatic Relevance Determination For Deep Generative Models
Karaletsos, Theofanis, Rätsch, Gunnar
A recurring problem when building probabilistic latent variable models is regularization and model selection, for instance, the choice of the dimensionality of the latent space. In the context of belief networks with latent variables, this problem has been adressed with Automatic Relevance Determination (ARD) employing Monte Carlo inference. We present a variational inference approach to ARD for Deep Generative Models using doubly stochastic variational inference to provide fast and scalable learning. We show empirical results on a standard dataset illustrating the effects of contracting the latent space automatically. We show that the resulting latent representations are significantly more compact without loss of expressive power of the learned models.
Traversing Knowledge Graphs in Vector Space
Guu, Kelvin, Miller, John, Liang, Percy
Path queries on a knowledge graph can be used to answer compositional questions such as "What languages are spoken by people living in Lisbon?". However, knowledge graphs often have missing facts (edges) which disrupts path queries. Recent models for knowledge base completion impute missing facts by embedding knowledge graphs in vector spaces. We show that these models can be recursively applied to answer path queries, but that they suffer from cascading errors. This motivates a new "compositional" training objective, which dramatically improves all models' ability to answer path queries, in some cases more than doubling accuracy. On a standard knowledge base completion task, we also demonstrate that compositional training acts as a novel form of structural regularization, reliably improving performance across all base models (reducing errors by up to 43%) and achieving new state-of-the-art results.
Listen, Attend and Spell
Chan, William, Jaitly, Navdeep, Le, Quoc V., Vinyals, Oriol
We present Listen, Attend and Spell (LAS), a neural network that learns to transcribe speech utterances to characters. Unlike traditional DNN-HMM models, this model learns all the components of a speech recognizer jointly. Our system has two components: a listener and a speller. The listener is a pyramidal recurrent network encoder that accepts filter bank spectra as inputs. The speller is an attention-based recurrent network decoder that emits characters as outputs. The network produces character sequences without making any independence assumptions between the characters. This is the key improvement of LAS over previous end-to-end CTC models. On a subset of the Google voice search task, LAS achieves a word error rate (WER) of 14.1% without a dictionary or a language model, and 10.3% with language model rescoring over the top 32 beams. By comparison, the state-of-the-art CLDNN-HMM model achieves a WER of 8.0%.
When Are Tree Structures Necessary for Deep Learning of Representations?
Li, Jiwei, Luong, Minh-Thang, Jurafsky, Dan, Hovy, Eudard
Recursive neural models, which use syntactic parse trees to recursively generate representations bottom-up, are a popular architecture. But there have not been rigorous evaluations showing for exactly which tasks this syntax-based method is appropriate. In this paper we benchmark {\bf recursive} neural models against sequential {\bf recurrent} neural models (simple recurrent and LSTM models), enforcing apples-to-apples comparison as much as possible. We investigate 4 tasks: (1) sentiment classification at the sentence level and phrase level; (2) matching questions to answer-phrases; (3) discourse parsing; (4) semantic relation extraction (e.g., {\em component-whole} between nouns). Our goal is to understand better when, and why, recursive models can outperform simpler models. We find that recursive models help mainly on tasks (like semantic relation extraction) that require associating headwords across a long distance, particularly on very long sequences. We then introduce a method for allowing recurrent models to achieve similar performance: breaking long sentences into clause-like units at punctuation and processing them separately before combining. Our results thus help understand the limitations of both classes of models, and suggest directions for improving recurrent models.
A Generative Model for Multi-Dialect Representation
In the era of deep learning several unsupervised models have been developed to capture the key features in unlabeled handwritten data. Popular among them is the Restricted Boltzmann Machines RBM. However, due to the novelty in handwritten multidialect data, the RBM may fail to generate an efficient representation. In this paper we propose a generative model, the Mode Synthesizing Machine MSM for on-line representation of real life handwritten multidialect language data. The MSM takes advantage of the hierarchical representation of the modes of a data distribution using a two-point error update to learn a sequence of representative multidialects in a generative way. Experiments were performed to evaluate the performance of the MSM over the RBM with the former attaining much lower error values than the latter on both independent and mixed data set.
A Deep Learning Approach to Structured Signal Recovery
Mousavi, Ali, Patel, Ankit B., Baraniuk, Richard G.
Abstract--In this paper, we develop a new framework for sensing and recovering structured signals. In contrast to compressive sensing (CS) systems that employ linear measurements, sparse representations, and computationally complex convex/greedy algorithms, we introduce a deep learning framework that supports both linear and mildly nonlinear measurements, that learns a structured representation from training data, and that efficiently computes a signal estimate. In particular, we apply a stacked denoising autoencoder (SDA), as an unsupervised feature learner. SDA enables us to capture statistical dependencies between the different elements of certain signals and improve signal recovery performance as compared to the CS approach. Many configurations for x and Γ(.) have been explored in the literature for this problem; however, one of the most useful ones is to have a sparse signal x and a linear Γ(.), i.e., y Γ(x) Φx. Compressive sensing (CS) [1]-[3] is a field that tries to solve this linear inverse problem in case that x has a sparse representation, i.e., there exists an N N basis matrix Ψ [ψ Department of Electrical and Computer Engineering Rice University Houston, TX 77005 (i) How to recover the signal x from a given measurement vector y and operator Γ(.)? (ii) How to design the measurement operator Γ(.)? (iii) If we are concerned with any type of structure, How could we find a representation in which the signal x has that structure? Although there has been a considerable progress in CS and particularly in the answers of aforementioned questions, our goal is to go beyond the state-of-the-art results.
Unbounded Bayesian Optimization via Regularization
Shahriari, Bobak, Bouchard-Côté, Alexandre, de Freitas, Nando
Bayesian optimization has recently emerged as a popular and efficient tool for global optimization and hyperparameter tuning. Currently, the established Bayesian optimization practice requires a user-defined bounding box which is assumed to contain the optimizer. However, when little is known about the probed objective function, it can be difficult to prescribe such bounds. In this work we modify the standard Bayesian optimization framework in a principled way to allow automatic resizing of the search space. We introduce two alternative methods and compare them on two common synthetic benchmarking test functions as well as the tasks of tuning the stochastic gradient descent optimizer of a multi-layered perceptron and a convolutional neural network on MNIST.
Improving Decision Analytics with Deep Learning: The Case of Financial Disclosures
Fehrer, Ralph, Feuerriegel, Stefan
Decision analytics commonly focuses on the text mining of financial news sources in order to provide managerial decision support and to predict stock market movements. Existing predictive frameworks almost exclusively apply traditional machine learning methods, whereas recent research indicates that traditional machine learning methods are not sufficiently capable of extracting suitable features and capturing the non-linear nature of complex tasks. As a remedy, novel deep learning models aim to overcome this issue by extending traditional neural network models with additional hidden layers. Indeed, deep learning has been shown to outperform traditional methods in terms of predictive performance. In this paper, we adapt the novel deep learning technique to financial decision support. In this instance, we aim to predict the direction of stock movements following financial disclosures. As a result, we show how deep learning can outperform the accuracy of random forests as a benchmark for machine learning by 5.66%.
Dependency-based Convolutional Neural Networks for Sentence Embedding
Ma, Mingbo, Huang, Liang, Xiang, Bing, Zhou, Bowen
In sentence modeling and classification, convolutional neural network approaches have recently achieved state-of-the-art results, but all such efforts process word vectors sequentially and neglect long-distance dependencies. To combine deep learning with linguistic structures, we propose a dependency-based convolution approach, making use of tree-based n-grams rather than surface ones, thus utlizing nonlocal interactions between words. Our model improves sequential baselines on all four sentiment and question classification tasks, and achieves the highest published accuracy on TREC.
Time-series modeling with undecimated fully convolutional neural networks
We present a new convolutional neural network-based time-series model. Typical convolutional neural network (CNN) architectures rely on the use of max-pooling operators in between layers, which leads to reduced resolution at the top layers. Instead, in this work we consider a fully convolutional network (FCN) architecture that uses causal filtering operations, and allows for the rate of the output signal to be the same as that of the input signal. We furthermore propose an undecimated version of the FCN, which we refer to as the undecimated fully convolutional neural network (UFCNN), and is motivated by the undecimated wavelet transform. Our experimental results verify that using the undecimated version of the FCN is necessary in order to allow for effective time-series modeling. The UFCNN has several advantages compared to other time-series models such as the recurrent neural network (RNN) and long short-term memory (LSTM), since it does not suffer from either the vanishing or exploding gradients problems, and is therefore easier to train. Convolution operations can also be implemented more efficiently compared to the recursion that is involved in RNN-based models. We evaluate the performance of our model in a synthetic target tracking task using bearing only measurements generated from a state-space model, a probabilistic modeling of polyphonic music sequences problem, and a high frequency trading task using a time-series of ask/bid quotes and their corresponding volumes. Our experimental results using synthetic and real datasets verify the significant advantages of the UFCNN compared to the RNN and LSTM baselines.