Deep Learning
On the Performance of GoogLeNet and AlexNet Applied to Sketches
Ballester, Pedro (Federal University of Pelotas (UFPel)) | Araujo, Ricardo Matsumura (Federal University of Pelotas (UFPel))
We however show that Convolutional Neural Networks (CNN) are considered the both networks are largely unable to recognize most tested state-of-the-art model in image recognition tasks. Part of a subjects, indicating that the learned representations are quite deep learning approach to machine learning, CNN have been different from that of humans. We argue that such approach deployed successfully in a variety of applications, including can be useful to assess classifiers' generalization capabilities, face recognition (Lawrence et al. 1997), object classification in particular regarding to the abstraction level of learned (Szegedy et al. 2014) and generating scene descriptions representations. (Pinheiro and Collobert 2013). This success can be partly The main contribution of this work is to put forward an attributed to advances in learning algorithms for deep architectures image recognition task where current state-of-the-art models and partly to large labeled data sets made available, differ significantly in performance when compared to humans.
A Deep Choice Model
Otsuka, Makoto (CREST, JST) | Osogami, Takayuki (IBM Research - Tokyo)
Human choice is complex in two ways. First, human choice often shows complex dependency on available alternatives. Second, human choice is often made after examining complex items such as images. The recently proposed choice model based on the restricted Boltzmann machine (RBM choice model) has been proved to represent three typical phenomena of human choice, which addresses the first complexity. We extend the RBM choice model to a deep choice model (DCM) to deal with the features of items, which are ignored in the RBM choice model. We then use deep learning to extract latent features from images and plug those latent features as input to the DCM. Our experiments show that the DCM adequately learns the choice that involves both of the two complexities in human choice.
Learning Deep Representation from Big and Heterogeneous Data for Traffic Accident Inference
Chen, Quanjun (The University of Tokyo) | Song, Xuan (The University of Tokyo) | Yamada, Harutoshi (The University of Tokyo) | Shibasaki, Ryosuke (The University of Tokyo)
With the rapid development of urbanization and public transportation system, the number of traffic accidents have significantly increased globally over the past decades and become a big problem for human society. Facing these possible and unexpected traffic accidents, understanding what causes traffic accident and early alarms for some possible ones will play a critical role on planning effective traffic management. However, due to the lack of supported sensing data, research is very limited on the field of updating traffic accident risk in real-time. Therefore, in this paper, we collect big and heterogeneous data (7 months traffic accident data and 1.6 million users' GPS records) to understand how human mobility will affect traffic accident risk. By mining these data, we develop a deep model of Stack denoise Autoencoder to learn hierarchical feature representation of human mobility. And these features are used for efficient prediction of traffic accident risk level. Once the model has been trained, our model can simulate corresponding traffic accident risk map with given real-time input of human mobility. The experimental results demonstrate the efficiency of our model and suggest that traffic accident risk can be significantly more predictable through human mobility.
Building a Large Scale Dataset for Image Emotion Recognition: The Fine Print and The Benchmark
You, Quanzeng (University of Rochester) | Luo, Jiebo (University of Rochester) | Jin, Hailin (Adobe Research ) | Yang, Jianchao (Snapchat Inc)
Psychological research results have confirmed that people can have different emotional reactions to different visual stimuli. Several papers have been published on the problem of visual emotion analysis. In particular, attempts have been made to analyze and predict people's emotional reaction towards images. To this end, different kinds of hand-tuned features are proposed. The results reported on several carefully selected and labeled small image data sets have confirmed the promise of such features. While the recent successes of many computer vision related tasks are due to the adoption of Convolutional Neural Networks (CNNs), visual emotion analysis has not achieved the same level of success. This may be primarily due to the unavailability of confidently labeled and relatively large image data sets for visual emotion analysis. In this work, we introduce a new data set, which started from 3+ million weakly labeled images of different emotions and ended up 30 times as large as the current largest publicly available visual emotion data set. We hope that this data set encourages further research on visual emotion analysis. We also perform extensive benchmarking analyses on this large data set using the state of the art methods including CNNs.
Fusing Social Networks with Deep Learning for Volunteerism Tendency Prediction
Jia, Yongpo (National University of Singapore) | Song, Xuemeng (National University of Singapore) | Zhou, Jingbo (Big Data Lab, Baidu Research) | Liu, Li (National University of Singapore) | Nie, Liqiang (National University of Singapore) | Rosenblum, David S. (National University of Singapore)
Social networks contain a wealth of useful information. In this paper, we study a challenging task for integrating users' information from multiple heterogeneous social networks to gain a comprehensive understanding of users' interests and behaviors. Although much effort has been dedicated to study this problem, most existing approaches adopt linear or shallow models to fuse information from multiple sources. Such approaches cannot properly capture the complex nature of and relationships among different social networks. Adopting deep learning approaches to learning a joint representation can better capture the complexity, but this neglects measuring the level of confidence in each source and the consistency among different sources. In this paper, we present a framework for multiple social network learning, whose core is a novel model that fuses social networks using deep learning with source confidence and consistency regularization. To evaluate the model, we apply it to predict individuals' tendency to volunteerism. With extensive experimental evaluations, we demonstrate the effectiveness of our model, which outperforms several state-of-the-art approaches in terms of precision, recall and F1-score.
Improved Neural Machine Translation with SMT Features
He, Wei (Baidu Inc.) | He, Zhongjun (Baidu Inc.) | Wu, Hua (Baidu Inc.) | Wang, Haifeng (Baidu Inc.)
Neural machine translation (NMT) conducts end-to-end translation with a source language encoder and a target language decoder, making promising translation performance. However, as a newly emerged approach, the method has some limitations. An NMT system usually has to apply a vocabulary of certain size to avoid the time-consuming training and decoding, thus it causes a serious out-of-vocabulary problem. Furthermore, the decoder lacks a mechanism to guarantee all the source words to be translated and usually favors short translations, resulting in fluent but inadequate translations. In order to solve the above problems, we incorporate statistical machine translation (SMT) features, such as a translation model and an n-gram language model, with the NMT model under the log-linear framework. Our experiments show that the proposed method significantly improves the translation quality of the state-ofthe-art NMT system on Chinese-to-English translation tasks. Our method produces a gain of up to 2.33 BLEU score on NIST open test sets.
Community-Based Question Answering via Heterogeneous Social Network Learning
Fang, Hanyin (Zhejiang University) | Wu, Fei (Zhejiang University) | Zhao, Zhou (Zhejiang University) | Duan, Xinyu (Zhejiang University) | Zhuang, Yueting (Zhejiang University) | Ester, Martin (Simon Fraser University)
Community-based question answering (cQA) sites have accumulated vast amount of questions and corresponding crowdsourced answers over time. How to efficiently share the underlying information and knowledge from reliable (usually highly-reputable) answerers has become an increasingly popular research topic. A major challenge in cQA tasks is the accurate matching of high-quality answers w.r.t given questions. Many of traditional approaches likely recommend corresponding answers merely depending on the content similarity between questions and answers, therefore suffer from the sparsity bottleneck of cQA data. In this paper, we propose a novel framework which encodes not only the contents of question-answer(Q-A) but also the social interaction cues in the community to boost the cQA tasks. More specifically, our framework collaboratively utilizes the rich interaction among questions, answers and answerers to learn the relative quality rank of different answers w.r.t a same question. Moreover, the information in heterogeneous social networks is comprehensively employed to enhance the quality of question-answering (QA) matching by our deep random walk learning framework. Extensive experiments on a large-scale dataset from a real world cQA site show that leveraging the heterogeneous social information indeed achieves better performance than other state-of-the-art cQA methods.
Business-Aware Visual Concept Discovery from Social Media for Multimodal Business Venue Recognition
Chen, Bor-Chun (University of Maryland) | Chen, Yan-Ying (FX Palo Alto Laboratory) | Chen, Francine (FX Palo Alto Laboratory) | Joshi, Dhiraj (FX Palo Alto Laboratory)
Image localization is important for marketing and recommendation of local business; however, the level of granularity is still a critical issue. Given a consumer photo and its rough GPS information, we are interested in extracting the fine-grained location information, i.e. business venues, of the image. To this end, we propose a novel framework for business venue recognition. The framework mainly contains three parts. First, business-aware visual concept discovery: we mine a set of concepts that are useful for business venue recognition based on three guidelines including business awareness, visually detectable, and discriminative power. We define concepts that satisfy all of these three criteria as business-aware visual concept. Second, business-aware concept detection by convolutional neural networks (BA-CNN): we propose a new network configuration that can incorporate semantic signals mined from business reviews for extracting semantic concept features from a query image. Third, multimodal business venue recognition: we extend visually detected concepts to multimodal feature representations that allow a test image to be associated with business reviews and images from social media for business venue recognition. The experiments results show the visual concepts detected by BA-CNN can achieve up to 22.5% relative improvement for business venue recognition compared to the state-of-the-art convolutional neural network features. Experiments also show that by leveraging multimodal information from social media we can further boost the performance, especially when the database images belonging to each business venue are scarce.
MUST-CNN: A Multilayer Shift-and-Stitch Deep Convolutional Architecture for Sequence-Based Protein Structure Prediction
Lin, Zeming (University of Virginia) | Lanchantin, Jack (University of Virginia) | Qi, Yanjun (University of Virginia)
Predicting protein properties such as solvent accessibility and secondary structure from its primary amino acid sequence is an important task in bioinformatics. Recently, a few deep learning models have surpassed the traditional window based multilayer perceptron. Taking inspiration from the image classification domain we propose a deep convolutional neural network architecture, MUST-CNN, to predict protein properties. This architecture uses a novel multilayer shift-and-stitch (MUST) technique to generate fully dense per-position predictions on protein sequences. Our model is significantly simpler than the state-of-the-art, yet achieves better results. By combining MUST and the efficient convolution operation, we can consider far more parameters while retaining very fast prediction speeds. We beat the state-of-the-art performance on two large protein property prediction datasets.
Want an open-source deep learning framework? Take your pick
Earlier this week, Google made a splash when it released its TensorFlow artificial intelligence software on GitHub under an open-source license. Google has a sizable stable of AI talent, and AI is working behind the scenes in popular products, including Gmail and Google search, so AI tools from Google are a big deal. Today on GitHub, TensorFlow, primarily written in C, is the top trending project of the day, the week, and the month, having accrued more than 10,000 stars in about one week. But there are several other open-source tools to choose from on GitHub if you want to improve your app with deep learning, a type of AI that involves training artificial neural networks on a bunch of data and then getting them to make inferences about new data. There are other frameworks available today -- these are just the most interesting ones I've encountered -- and more will surely emerge in the future.