Goto

Collaborating Authors

 Deep Learning


Jointly Modeling Deep Video and Compositional Text to Bridge Vision and Language in a Unified Framework

AAAI Conferences

Recently, joint video-language modeling has been attracting more and more attention. However, most existing approaches focus on exploring the language model upon on a fixed visual model. In this paper, we propose a unified framework that jointly models video and the corresponding text sentences. The framework consists of three parts: a compositional semantics language model, a deep video model and a joint embedding model. In our language model, we propose a dependency-tree structure model that embeds sentence into a continuous vector space, which preserves visually grounded meanings and word order. In the visual model, we leverage deep neural networks to capture essential semantic information from videos. In the joint embedding model, we minimize the distance of the outputs of the deep video model and compositional language model in the joint space, and update these two models jointly. Based on these three parts, our system is able to accomplish three tasks: 1) natural language generation, and 2) video retrieval and 3) language retrieval. In the experiments, the results show our approach outperforms SVM, CRF and CCA baselines in predicting Subject-Verb- Object triplet and natural sentence generation, and is better than CCA in video retrieval and language retrieval tasks.


Recurrent Convolutional Neural Networks for Text Classification

AAAI Conferences

Text classification is a foundational task in many NLP applications. Traditional text classifiers often rely on many human-designed features, such as dictionaries, knowledge bases and special tree kernels. In contrast to traditional methods, we introduce a recurrent convolutional neural network for text classification without human-designed features. In our model, we apply a recurrent structure to capture contextual information as far as possible when learning word representations, which may introduce considerably less noise compared to traditional window-based neural networks. We also employ a max-pooling layer that automatically judges which words play key roles in text classification to capture the key components in texts. We conduct experiments on four commonly used datasets. The experimental results show that the proposed method outperforms the state-of-the-art methods on several datasets, particularly on document-level datasets.


A Novel Neural Topic Model and Its Supervised Extension

AAAI Conferences

Topic modeling techniques have the benefits of modeling words and documents uniformly under a probabilistic framework. However, they also suffer from the limitations of sensitivity to initialization and unigram topic distribution, which can be remedied by deep learning techniques. To explore the combination of topic modeling and deep learning techniques, we first explain the standard topic modelfrom the perspective of a neural network. Based on this, we propose a novel neural topic model (NTM) where the representation of words and documents are efficiently and naturally combined into a uniform framework. Extending from NTM, we can easily add a label layer and propose the supervised neural topic model (sNTM) to tackle supervised tasks. Experiments show that our models are competitive in both topic discovery and classification/regression tasks.


Temporally Adaptive Restricted Boltzmann Machine for Background Modeling

AAAI Conferences

We examine the fundamental problem of background modeling which is to model the background scenes in video sequences and segment the moving objects from the background. A novel approach is proposed based on the Restricted Boltzmann Machine (RBM) while exploiting the temporal nature of the problem. In particular, we augment the standard RBM to take a window of sequential video frames as input and generate the background model while enforcing the background smoothly adapting to the temporal changes. As a result, the augmented temporally adaptive model can generate stable background given noisy inputs and adapt quickly to the changes in background while keeping all the advantages of RBMs including exact inference and effective learning procedure. Experimental results demonstrate the effectiveness of the proposed method in modeling the temporal nature in background.


PD Disease State Assessment in Naturalistic Environments Using Deep Learning

AAAI Conferences

Management of Parkinson's Disease (PD) could be improved significantly if reliable, objective information about fluctuations in disease severity can be obtained in ecologically valid surroundings such as the private home. Although automatic assessment in PD has been studied extensively, so far no approach has been devised that is useful for clinical practice. Analysis approaches common for the field lack the capability of exploiting data from realistic environments, which represents a major barrier towards practical assessment systems. The very unreliable and infrequent labelling of ambiguous, low resolution movement data collected in such environments represents a very challenging analysis setting, where advances would have significant societal impact in our ageing population. In this work we propose an assessment system that abides practical usability constraints and applies deep learning to differentiate disease state in data collected in naturalistic settings. Based on a large data-set collected from 34 people with PD we illustrate that deep learning outperforms other approaches in generalisation performance, despite the unreliable labelling characteristic for this problem setting, and how such systems could improve current clinical practice.


Efficient Benchmarking of Hyperparameter Optimizers via Surrogates

AAAI Conferences

Hyperparameter optimization is crucial for achieving peak performance with many machine learning algorithms; however, the evaluation of new optimization techniques on real-world hyperparameter optimization problems can be very expensive. Therefore, experiments are often performed using cheap synthetic test functions with characteristics rather different from those of real benchmarks of interest. In this work, we introduce another option: cheap-to-evaluate surrogates of real hyperparameter optimization benchmarks that share the same hyperparameter spaces and feature similar response surfaces. Specifically, we train regression models on data describing a machine learning algorithmโ€™s performance depending on its hyperparameter setting, and then cheaply evaluate hyperparameter optimization methods using the modelโ€™s performance predictions in lieu of running the real algorithm. We evaluated a wide range of regression techniques, both in terms of how well they predict the performance of new hyperparameter settings and in terms of the quality of surrogate benchmarks obtained. We found that tree-based models capture the performance of several machine learning algorithms well and yield surrogate benchmarks that closely resemble real-world benchmarks, while being much easier to use and orders of magnitude cheaper to evaluate.


Robust Image Sentiment Analysis Using Progressively Trained and Domain Transferred Deep Networks

AAAI Conferences

Sentiment analysis of online user generated content is important for many social media analytics tasks. Researchers have largely relied on textual sentiment analysis to develop systems to predict political elections, measure economic indicators, and so on. Recently, social media users are increasingly using images and videos to express their opinions and share their experiences. Sentiment analysis of such large scale visual content can help better extract user sentiments toward events or topics, such as those in image tweets, so that prediction of sentiment from visual content is complementary to textual sentiment analysis. Motivated by the needs in leveraging large scale yet noisy training data to solve the extremely challenging problem of image sentiment analysis, we employ Convolutional Neural Networks (CNN). We first design a suitable CNN architecture for image sentiment analysis. We obtain half a million training samples by using a baseline sentiment algorithm to label Flickr images. To make use of such noisy machine labeled data, we employ a progressive strategy to fine-tune the deep network. Furthermore, we improve the performance on Twitter images by inducing domain transfer with a small number of manually labeled Twitter images. We have conducted extensive experiments on manually labeled Twitter images. The results show that the proposed CNN can achieve better performance in image sentiment analysis than competing algorithms.


Question/Answer Matching for CQA System via Combining Lexical and Sequential Information

AAAI Conferences

Community-based Question Answering (CQA) has become popular in knowledge sharing sites since it allows users to get answers to complex, detailed, and personal questions directly from other users. Large archives of historical questions and associated answers have been accumulated. Retrieving relevant historical answers that best match a question is an essential component of a CQA service. Most state of the art approaches are based on bag-of-words models, which have been proven successful in a range of text matching tasks, but are insufficient for capturing the important word sequence information in short text matching. In this paper, a new architecture is proposed to more effectively model the complicated matching relations between questions and answers. It utilises a similarity matrix which contains both lexical and sequential information. Afterwards the information is put into a deep architecture to find potentially suitable answers. The experimental study shows its potential in improving matching accuracy of question and answer.


To Drop or Not to Drop: Robustness, Consistency and Differential Privacy Properties of Dropout

arXiv.org Machine Learning

Training deep belief networks (DBNs) requires optimizing a non-convex function with an extremely large number of parameters. Naturally, existing gradient descent (GD) based methods are prone to arbitrarily poor local minima. In this paper, we rigorously show that such local minima can be avoided (upto an approximation error) by using the dropout technique, a widely used heuristic in this domain. In particular, we show that by randomly dropping a few nodes of a one-hidden layer neural network, the training objective function, up to a certain approximation error, decreases by a multiplicative factor. On the flip side, we show that for training convex empirical risk minimizers (ERM), dropout in fact acts as a "stabilizer" or regularizer. That is, a simple dropout based GD method for convex ERMs is stable in the face of arbitrary changes to any one of the training points. Using the above assertion, we show that dropout provides fast rates for generalization error in learning (convex) generalized linear models (GLM). Moreover, using the above mentioned stability properties of dropout, we design dropout based differentially private algorithms for solving ERMs. The learned GLM thus, preserves privacy of each of the individual training points while providing accurate predictions for new test points. Finally, we empirically validate our stability assertions for dropout in the context of convex ERMs and show that surprisingly, dropout significantly outperforms (in terms of prediction accuracy) the L2 regularization based methods for several benchmark datasets.


Learning Stochastic Recurrent Networks

arXiv.org Machine Learning

A BSTRACT Leveraging advances in variational inference, we propose to enhance recurrent neural networks with latent variables, resulting in Stochastic Recurrent Networks (STORNs). The model i) can be trained with stochastic gradient methods, ii) allows structured and multi-modal conditionals at each time step, iii) features a reliable estimator of the marginal likelihood and iv) is a generalisation of deterministic recurrent neural networks. We evaluate the method on four polyphonic musical data sets and motion capture data. 1 I NTRODUCTION Recurrent Neural Networks (RNNs) are flexible and powerful tools for modeling sequences. While only bearing marginal existence in the 1990's, recent successes in real world applications (Graves, 2013; Graves et al., 2013; Sutskever et al., 2014; Graves et al., 2008; Cho et al., 2014) have resurged interest. This is partially due to architectural enhancements (Hochreiter & Schmidhuber, 1997), new optimisation findings (Martens & Sutskever, 2011; Sutskever et al., 2013; Bengio et al., 2012) and the increased computional power available to researchers.