Goto

Collaborating Authors

 Deep Learning


Imaging Time-Series to Improve Classification and Imputation

AAAI Conferences

Inspired by recent successes of deep learning in computer vision, we propose a novel framework for encoding time series as different types of images, namely, Gramian Angular Summation/Difference Fields (GASF/GADF) and Markov Transition Fields (MTF). This enables the use of techniques from computer vision for time series classification and imputation. We used Tiled Convolutional Neural Networks (tiled CNNs) on 20 standard datasets to learn high-level features from the individual and compound GASF-GADF-MTF images. Our approaches achieve highly competitive results when compared to nine of the current best time series classification approaches. Inspired by the bijection property of GASF on 0/1 rescaled data, we train Denoised Auto-encoders (DA) on the GASF images of four standard and one synthesized compound dataset. The imputation MSE on test data is reduced by 12.18% โ€“ 48.02% when compared to using the raw data. An analysis of the features and weights learned via tiled CNNs and DAs explains why the approaches work.


Equivalence Results between Feedforward and Recurrent Neural Networks for Sequences

AAAI Conferences

In the context of sequence processing, we study the relationship between single-layer feedforward neural networks,that have simultaneous access to all items composing a sequence, and single-layer recurrent neural networks which access information one step at a time.We treat both linear and nonlinear networks, describing a constructive procedure, based on linear autoencoders for sequences, that given a feedforward neural network shows how to define a recurrent neural network that implements the same function in time. Upper bounds on the required number of hidden units for the recurrent network as a function of some features of the feedforward network are given. By separating the functional from the memory component, the proposed procedure suggests new efficient learning as well as interpretation procedures for recurrent neural networks.


Image Feature Learning for Cold Start Problem in Display Advertising

AAAI Conferences

In online display advertising, state-of-the-art Click Through Rate(CTR) prediction algorithms rely heavily on historical information, and they work poorly on growing number of new ads without any historical information. This is known as the the cold start problem. For image ads, current state-of-the-art systems use handcrafted image features such as multimedia features and SIFT features to capture the attractiveness of ads. However, these handcrafted features are task dependent, inflexible and heuristic. In order to tackle the cold start problem in image display ads, we propose a new feature learning architecture to learn the most discriminative image features directly from raw pixels and user feedback in the target task. The proposed method is flexible and does not depend on human heuristic. Extensive experiments on a real world dataset with 47 billion records show that our feature learning method outperforms existing handcrafted features significantly, and it can extract discriminative and meaningful features.


Scalable Gaussian Process Regression Using Deep Neural Networks

AAAI Conferences

We propose a scalable Gaussian process model for regression by applying a deep neural network as the feature-mapping function. We first pre-train the deep neural network with a stacked denoising auto-encoder in an unsupervised way. Then, we perform a Bayesian linear regression on the top layer of the pre-trained deep network. The resulting model, Deep-Neural-Network-based Gaussian Process (DNN-GP), can learn much more meaningful representation of the data by the finite-dimensional but deep-layered feature-mapping function. Unlike standard Gaussian processes, our model scales well with the size of the training set due to the avoidance of kernel matrix inversion. Moreover, we present a mixture of DNN-GPs to further improve the regression performance. For the experiments on three representative large datasets, our proposed models significantly outperform the state-of-the-art algorithms of Gaussian process regression.


Speeding Up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves

AAAI Conferences

Deep neural networks (DNNs) show very strong performance on many machine learning problems, but they are very sensitive to the setting of their hyperparameters. Automated hyperparameter optimization methods have recently been shown to yield settings competitive with those found by human experts, but their widespread adoption is hampered by the fact that they require more computational resources than human experts. Humans have one advantage: when they evaluate a poor hyperparameter setting they can quickly detect (after a few SGD steps) that the resulting network performs poorly and terminate the corresponding evaluation to save time. Here, we mimic this early termination of bad runs based on a probabilistic model that extrapolates performance from the first part of a learning curve. Experiments with different neural network architectures show that our resulting approach speeds up state-of-the-art hyperparameter optimization methods for DNNs roughly twofold, enabling them to find DNN settings that yield better performance than those chosen by human experts.


Learning to Rap Battle with Bilingual Recursive Neural Networks

AAAI Conferences

We describe an unconventional line of attack in our quest to teach machines how to rap battle by improvising hip hop lyrics on the fly, in which a novel recursive bilingual neural network, TRAAM, implicitly learns soft, context-dependent generalizations over the structural relationships between associated parts of challenge and response raps, while avoiding the exponential complexity costs that symbolic models would require. TRAAM learns feature vectors simultaneously using context from both the challenge and the response, such that challenge-response association patterns with similar structure tend to have similar vectors. Improvisation is modeled as a quasi-translation learning problem, where TRAAM is trained to improvise fluent and rhyming responses to challenge lyrics. The soft structural relationships learned by our TRAAM model are used to improve the probabilistic responses generated by our improvisational response component.


The Scaffolded Sound Beehive

AAAI Conferences

The Scaffolded Sound Beehive is an immersive multi-media installation which provides viewers an artistic visual and audio experience of activities in a beehive. Data were recorded in urban beehives and processed using sophisticated pattern recognition, AI technologies, and sonification and computer graphics software. The installation includes an experiment in using Deep Learning to interpret the activities in the hive based on sound and microclimate recording.


Pseudo-Supervised Training Improves Unsupervised Melody Segmentation

AAAI Conferences

An important aspect of music perception in humans is the ability to segment streams of musical events into structural units such as motifs and phrases.A promising approach to the computational modeling of music segmentation employs the statistical and information-theoretic properties of musical data, based on the hypothesis that these properties can (at least partly) account for music segmentation in humans. Prior work has shown that in particular the information content of music events, as estimated from a generative probabilistic model of those events, is a good indicator for segment boundaries.In this paper we demonstrate that, remarkably, a substantial increase in segmentation accuracy can be obtained by not using information content estimates directly, but rather in a bootstrapping fashion. More specifically, we use information content estimates computed from a generative model of the data as a target for a feed-forward neural network that is trained to estimate the information content directly from the data. We hypothesize that the improved segmentation accuracy of this bootstrapping approach may be evidence that the generative model provides noisy estimates of the information content, which are smoothed by the feed-forward neural network, yielding more accurate information content estimates.


Learning Geographical Hierarchy Features for Social Image Location Prediction

AAAI Conferences

Image location prediction is to estimate the geolocation where an image is taken. Social image contains heterogeneous contents, which makes image location prediction nontrivial. Moreover, it is observed that image content patterns and location preferences correlate hierarchically. Traditional image location prediction methods mainly adopt a single-level architecture, which is not directly adaptable to the hierarchical correlation. In this paper, we propose a geographically hierarchical bi-modal deep belief network model (GH-BDBN), which is a compositional learning architecture that integrates multi-modal deep learning model with non-parametric hierarchical prior model. GH-BDBN learns a joint representation capturing the correlations among different types of image content using a bi-modal DBN, with a geographically hierarchical prior over the joint representation to model the hierarchical correlation between image content and location. Experimental results demonstrate the superiority of our model for image location prediction.


Deep Learning for Event-Driven Stock Prediction

AAAI Conferences

We propose a deep learning method for eventdriven stock market prediction. First, events are extracted from news text, and represented as dense vectors, trained using a novel neural tensor network. Second, a deep convolutional neural network is used to model both short-term and long-term influences of events on stock price movements. Experimental results show that our model can achieve nearly 6% improvements on S&P 500 index prediction and individual stock prediction, respectively, compared to state-of-the-art baseline methods. In Figure 1: Example news influence of Google Inc. addition, market simulation results show that our system is more capable of making profits than previously reported systems trained on S&P 500 stock of events can be better captured [Ding et al., 2014].