AITopics | Deep Learning

Collaborating Authors

Deep Learning

New computational algorithms make it possible to build neural networks with many input nodes and many layers, and distinguish "deep learning" of these networks from previous work on artificial neural nets.

News Overviews Instructional Materials AI-Alerts Classics

Automatic Relevance Determination For Deep Generative Models

Karaletsos, Theofanis, Rätsch, Gunnar

arXiv.org Machine LearningAug-26-2015

A recurring problem when building probabilistic latent variable models is regularization and model selection, for instance, the choice of the dimensionality of the latent space. In the context of belief networks with latent variables, this problem has been adressed with Automatic Relevance Determination (ARD) employing Monte Carlo inference. We present a variational inference approach to ARD for Deep Generative Models using doubly stochastic variational inference to provide fast and scalable learning. We show empirical results on a standard dataset illustrating the effects of contracting the latent space automatically. We show that the resulting latent representations are significantly more compact without loss of expressive power of the learned models.

artificial intelligence, latent space, machine learning, (13 more...)

arXiv.org Machine Learning

1505.07765

Country: North America > United States (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.49)

Add feedback

Traversing Knowledge Graphs in Vector Space

Guu, Kelvin, Miller, John, Liang, Percy

arXiv.org Artificial IntelligenceAug-19-2015

Path queries on a knowledge graph can be used to answer compositional questions such as "What languages are spoken by people living in Lisbon?". However, knowledge graphs often have missing facts (edges) which disrupts path queries. Recent models for knowledge base completion impute missing facts by embedding knowledge graphs in vector spaces. We show that these models can be recursively applied to answer path queries, but that they suffer from cascading errors. This motivates a new "compositional" training objective, which dramatically improves all models' ability to answer path queries, in some cases more than doubling accuracy. On a standard knowledge base completion task, we also demonstrate that compositional training acts as a novel form of structural regularization, reliably improving performance across all base models (reducing errors by up to 43%) and achieving new state-of-the-art results.

artificial intelligence, machine learning, query, (15 more...)

arXiv.org Artificial Intelligence

1506.01094

Country:

Europe > Portugal > Lisbon > Lisbon (0.24)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Listen, Attend and Spell

Chan, William, Jaitly, Navdeep, Le, Quoc V., Vinyals, Oriol

arXiv.org Machine LearningAug-19-2015

We present Listen, Attend and Spell (LAS), a neural network that learns to transcribe speech utterances to characters. Unlike traditional DNN-HMM models, this model learns all the components of a speech recognizer jointly. Our system has two components: a listener and a speller. The listener is a pyramidal recurrent network encoder that accepts filter bank spectra as inputs. The speller is an attention-based recurrent network decoder that emits characters as outputs. The network produces character sequences without making any independence assumptions between the characters. This is the key improvement of LAS over previous end-to-end CTC models. On a subset of the Google voice search task, LAS achieves a word error rate (WER) of 14.1% without a dictionary or a language model, and 10.3% with language model rescoring over the top 32 beams. By comparison, the state-of-the-art CLDNN-HMM model achieves a WER of 8.0%.

artificial intelligence, machine learning, natural language, (14 more...)

arXiv.org Machine Learning

1508.01211

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

When Are Tree Structures Necessary for Deep Learning of Representations?

Li, Jiwei, Luong, Minh-Thang, Jurafsky, Dan, Hovy, Eudard

arXiv.org Artificial IntelligenceAug-18-2015

Recursive neural models, which use syntactic parse trees to recursively generate representations bottom-up, are a popular architecture. But there have not been rigorous evaluations showing for exactly which tasks this syntax-based method is appropriate. In this paper we benchmark {\bf recursive} neural models against sequential {\bf recurrent} neural models (simple recurrent and LSTM models), enforcing apples-to-apples comparison as much as possible. We investigate 4 tasks: (1) sentiment classification at the sentence level and phrase level; (2) matching questions to answer-phrases; (3) discourse parsing; (4) semantic relation extraction (e.g., {\em component-whole} between nouns). Our goal is to understand better when, and why, recursive models can outperform simpler models. We find that recursive models help mainly on tasks (like semantic relation extraction) that require associating headwords across a long distance, particularly on very long sequences. We then introduce a method for allowing recurrent models to achieve similar performance: breaking long sentences into clause-like units at punctuation and processing them separately before combining. Our results thus help understand the limitations of both classes of models, and suggest directions for improving recurrent models.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

1503.00185

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.71)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Generative Model for Multi-Dialect Representation

Osegi, Emmanuel N.

arXiv.org Machine LearningAug-17-2015

In the era of deep learning several unsupervised models have been developed to capture the key features in unlabeled handwritten data. Popular among them is the Restricted Boltzmann Machines RBM. However, due to the novelty in handwritten multidialect data, the RBM may fail to generate an efficient representation. In this paper we propose a generative model, the Mode Synthesizing Machine MSM for on-line representation of real life handwritten multidialect language data. The MSM takes advantage of the hierarchical representation of the modes of a data distribution using a two-point error update to learn a sequence of representative multidialects in a generative way. Experiments were performed to evaluate the performance of the MSM over the RBM with the former attaining much lower error values than the latter on both independent and mixed data set.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Machine Learning

1508.04035

Country: Africa > Nigeria (0.30)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

A Deep Learning Approach to Structured Signal Recovery

Mousavi, Ali, Patel, Ankit B., Baraniuk, Richard G.

arXiv.org Machine LearningAug-17-2015

Abstract--In this paper, we develop a new framework for sensing and recovering structured signals. In contrast to compressive sensing (CS) systems that employ linear measurements, sparse representations, and computationally complex convex/greedy algorithms, we introduce a deep learning framework that supports both linear and mildly nonlinear measurements, that learns a structured representation from training data, and that efficiently computes a signal estimate. In particular, we apply a stacked denoising autoencoder (SDA), as an unsupervised feature learner. SDA enables us to capture statistical dependencies between the different elements of certain signals and improve signal recovery performance as compared to the CS approach. Many configurations for x and Γ(.) have been explored in the literature for this problem; however, one of the most useful ones is to have a sparse signal x and a linear Γ(.), i.e., y Γ(x) Φx. Compressive sensing (CS) [1]-[3] is a field that tries to solve this linear inverse problem in case that x has a sparse representation, i.e., there exists an N N basis matrix Ψ [ψ Department of Electrical and Computer Engineering Rice University Houston, TX 77005 (i) How to recover the signal x from a given measurement vector y and operator Γ(.)? (ii) How to design the measurement operator Γ(.)? (iii) If we are concerned with any type of structure, How could we find a representation in which the signal x has that structure? Although there has been a considerable progress in CS and particularly in the answers of aforementioned questions, our goal is to go beyond the state-of-the-art results.

algorithm, artificial intelligence, machine learning, (19 more...)

arXiv.org Machine Learning

doi: 10.1109/ALLERTON.2015.7447163

1508.04065

Country: North America > United States > Texas > Harris County > Houston (0.24)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Unbounded Bayesian Optimization via Regularization

Shahriari, Bobak, Bouchard-Côté, Alexandre, de Freitas, Nando

arXiv.org Machine LearningAug-14-2015

Bayesian optimization has recently emerged as a popular and efficient tool for global optimization and hyperparameter tuning. Currently, the established Bayesian optimization practice requires a user-defined bounding box which is assumed to contain the optimizer. However, when little is known about the probed objective function, it can be difficult to prescribe such bounds. In this work we modify the standard Bayesian optimization framework in a principled way to allow automatic resizing of the search space. We introduce two alternative methods and compare them on two common synthetic benchmarking test functions as well as the tasks of tuning the stochastic gradient descent optimizer of a multi-layered perceptron and a convolutional neural network on MNIST.

artificial intelligence, machine learning, optimization, (18 more...)

arXiv.org Machine Learning

1508.03666

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
(2 more...)

Add feedback

Improving Decision Analytics with Deep Learning: The Case of Financial Disclosures

Fehrer, Ralph, Feuerriegel, Stefan

arXiv.org Machine LearningAug-9-2015

Decision analytics commonly focuses on the text mining of financial news sources in order to provide managerial decision support and to predict stock market movements. Existing predictive frameworks almost exclusively apply traditional machine learning methods, whereas recent research indicates that traditional machine learning methods are not sufficiently capable of extracting suitable features and capturing the non-linear nature of complex tasks. As a remedy, novel deep learning models aim to overcome this issue by extending traditional neural network models with additional hidden layers. Indeed, deep learning has been shown to outperform traditional methods in terms of predictive performance. In this paper, we adapt the novel deep learning technique to financial decision support. In this instance, we aim to predict the direction of stock movements following financial disclosures. As a result, we show how deep learning can outperform the accuracy of random forests as a benchmark for machine learning by 5.66%.

artificial intelligence, autoencoder, machine learning, (16 more...)

arXiv.org Machine Learning

1508.01993

Country: North America > United States (0.68)

Genre: Research Report > New Finding (0.66)

Industry: Banking & Finance > Trading (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Dependency-based Convolutional Neural Networks for Sentence Embedding

Ma, Mingbo, Huang, Liang, Xiang, Bing, Zhou, Bowen

arXiv.org Artificial IntelligenceAug-3-2015

In sentence modeling and classification, convolutional neural network approaches have recently achieved state-of-the-art results, but all such efforts process word vectors sequentially and neglect long-distance dependencies. To combine deep learning with linguistic structures, we propose a dependency-based convolution approach, making use of tree-based n-grams rather than surface ones, thus utlizing nonlocal interactions between words. Our model improves sequential baselines on all four sentiment and question classification tasks, and achieves the highest published accuracy on TREC.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

1507.01839

Country:

North America > United States > Hawaii (0.05)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Spain > Galicia > Madrid (0.04)
(4 more...)

Genre: Research Report (0.64)

Industry:

Media > Film (0.47)
Leisure & Entertainment (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Time-series modeling with undecimated fully convolutional neural networks

Mittelman, Roni

arXiv.org Machine LearningAug-3-2015

We present a new convolutional neural network-based time-series model. Typical convolutional neural network (CNN) architectures rely on the use of max-pooling operators in between layers, which leads to reduced resolution at the top layers. Instead, in this work we consider a fully convolutional network (FCN) architecture that uses causal filtering operations, and allows for the rate of the output signal to be the same as that of the input signal. We furthermore propose an undecimated version of the FCN, which we refer to as the undecimated fully convolutional neural network (UFCNN), and is motivated by the undecimated wavelet transform. Our experimental results verify that using the undecimated version of the FCN is necessary in order to allow for effective time-series modeling. The UFCNN has several advantages compared to other time-series models such as the recurrent neural network (RNN) and long short-term memory (LSTM), since it does not suffer from either the vanishing or exploding gradients problems, and is therefore easier to train. Convolution operations can also be implemented more efficiently compared to the recursion that is involved in RNN-based models. We evaluate the performance of our model in a synthetic target tracking task using bearing only measurements generated from a state-space model, a probabilistic modeling of polyphonic music sequences problem, and a high frequency trading task using a time-series of ask/bid quotes and their corresponding volumes. Our experimental results using synthetic and real datasets verify the significant advantages of the UFCNN compared to the RNN and LSTM baselines.

artificial intelligence, machine learning, neural network, (18 more...)

arXiv.org Machine Learning

1508.00317

Genre:

Research Report (0.82)
Instructional Material > Course Syllabus & Notes (0.46)

Industry:

Leisure & Entertainment > Games (0.46)
Media > Music (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback