University of Science and Technology of China
How Images Inspire Poems: Generating Classical Chinese Poetry from Images with Memory Networks
Xu, Linli (University of Science and Technology of China) | Jiang, Liang ( University of Science and Technology of China ) | Qin, Chuan (University of Science and Technology of China) | Wang, Zhe (Ant Financial Services Group) | Du, Dongfang (University of Science and Technology of China)
With the recent advances of neural models and natural language processing, automatic generation of classical Chinese poetry has drawn significant attention due to its artistic and cultural value. Previous works mainly focus on generating poetry given keywords or other text information, while visual inspirations for poetry have been rarely explored. Generating poetry from images is much more challenging than generating poetry from text, since images contain very rich visual information which cannot be described completely using several keywords, and a good poem should convey the image accurately. In this paper, we propose a memory based neural model which exploits images to generate poems. Specifically, an Encoder-Decoder model with a topic memory network is proposed to generate classical Chinese poetry from images. To the best of our knowledge, this is the first work attempting to generate classical Chinese poetry from images with neural networks. A comprehensive experimental investigation with both human evaluation and quantitative analysis demonstrates that the proposed model can generate poems which convey images accurately.
Video-Based Sign Language Recognition Without Temporal Segmentation
Huang, Jie (University of Science and Technology of China) | Zhou, Wengang ( University of Science and Technology of China ) | Zhang, Qilin (HERE Technologies, Chicago, Illinois) | Li, Houqiang ( University of Science and Technology of China ) | Li, Weiping ( University of Science and Technology of China )
Millions of hearing impaired people around the world routinely use some variants of sign languages to communicate, thus the automatic translation of a sign language is meaningful and important. Currently, there are two sub-problems in Sign Language Recognition (SLR), i.e., isolated SLR that recognizes word by word and continuous SLR that translates entire sentences. Existing continuous SLR methods typically utilize isolated SLRs as building blocks, with an extra layer of preprocessing (temporal segmentation) and another layer of post-processing (sentence synthesis). Unfortunately, temporal segmentation itself is non-trivial and inevitably propagates errors into subsequent steps. Worse still, isolated SLR methods typically require strenuous labeling of each word separately in a sentence, severely limiting the amount of attainable training data. To address these challenges, we propose a novel continuous sign recognition framework, the Hierarchical Attention Network with Latent Space (LS-HAN), which eliminates the preprocessing of temporal segmentation. The proposed LS-HAN consists of three components: a two-stream Convolutional Neural Network (CNN) for video feature representation generation, a Latent Space (LS) for semantic gap bridging, and a Hierarchical Attention Network (HAN) for latent space based recognition. Experiments are carried out on two large scale datasets. Experimental results demonstrate the effectiveness of the proposed framework.
Exercise-Enhanced Sequential Modeling for Student Performance Prediction
Su, Yu (Anhui University) | Liu, Qingwen (iFLYTEK CO.,LTD. ) | Liu, Qi (iFLYTEK CO.,LTD.) | Huang, Zhenya (University of Science and Technology of China ) | Yin, Yu ( University of Science and Technology of China ) | Chen, Enhong ( University of Science and Technology of China ) | Ding, Chris ( University of Science and Technology of China ) | Wei, Si ( University of Science and Technology of China ) | Hu, Guoping (University of Texas at Arlington)
In online education systems, for offering proactive services to students (e.g., personalized exercise recommendation), a crucial demand is to predict student performance (e.g., scores) on future exercising activities. Existing prediction methods mainly exploit the historical exercising records of students, where each exercise is usually represented as the manually labeled knowledge concepts, and the richer information contained in the text description of exercises is still underexplored. In this paper, we propose a novel Exercise-Enhanced Recurrent Neural Network (EERNN) framework for student performance prediction by taking full advantage of both student exercising records and the text of each exercise. Specifically, for modeling the student exercising process, we first design a bidirectional LSTM to learn each exercise representation from its text description without any expertise and information loss. Then, we propose a new LSTM architecture to trace student states (i.e., knowledge states) in their sequential exercising process with the combination of exercise representations. For making final predictions, we design two strategies under EERNN, i.e., EERNNM with Markov property and EERNNA with Attention mechanism. Extensive experiments on large-scale real-world data clearly demonstrate the effectiveness of EERNN framework. Moreover, by incorporating the exercise correlations, EERNN can well deal with the cold start problems from both student and exercise perspectives.
Bridging Video Content and Comments: Synchronized Video Description with Temporal Summarization of Crowdsourced Time-Sync Comments
Xu, Linli (University of Science and Technology of China) | Zhang, Chao ( University of Science and Technology of China )
With the rapid growth of online sharing media, we are facing a huge collection of videos. In the meantime, due to the volume and complexity of video data, it can be tedious and time consuming to index or annotate videos. In this paper, we propose to generate temporal descriptions of videos by exploiting the information of crowdsourced time-sync comments which are receiving increasing popularity on many video sharing websites. In this framework, representative and interesting comments of a video are selected and highlighted along the timeline, which provide an informative description of the video in a time-sync manner. The challenge of the proposed application comes from the extremely informal and noisy nature of the comments, which are usually short sentences and on very different topics. To resolve these issues, we propose a novel temporal summarization model based on the data reconstruction principle, where representative comments are selected in order to best reconstruct the original corpus at the text level as well as the topic level while incorporating the temporal correlations of the comments. Experimental results on real-world data demonstrate the effectiveness of the proposed framework and justify the idea of exploiting crowdsourced time-sync comments as a bridge to describe videos.
Modeling Users’ Preferences and Social Links in Social Networking Services: A Joint-Evolving Perspective
Wu, Le (University of Science and Technology of China) | Ge, Yong (University of North Carolina at Charlotte) | Liu, Qi (University of Science and Technology of China) | Chen, Enhong (University of Science and Technology of China) | Long, Bai (China Electronics Technology Group Corporation No.38 Research Institute) | Huang, Zhenya ( University of Science and Technology of China )
Researchers have long converged that the evolution of a Social Networking Service (SNS) platform is driven by the interplay between users' preferences (reflected in user-item consumption behavior) and the social network structure (reflected in user-user interaction behavior), with both kinds of users' behaviors change from time to time. However, traditional approaches either modeled these two kinds of behaviors in an isolated way or relied on a static assumption of a SNS. Thus, it is still unclear how do the roles of users' historical preferences and the dynamic social network structure affect the evolution of SNSs. Furthermore, can jointly modeling users' temporal behaviors in SNSs benefit both behavior prediction tasks?In this paper, we leverage the underlying social theories(i.e., social influence and the homophily effect) to investigate the interplay and evolution of SNSs. We propose a probabilistic approach to fuse these social theories for jointly modeling users' temporal behaviors in SNSs. Thus our proposed model has both the explanatory ability and predictive power. Experimental results on two real-world datasets demonstrate the effectiveness of our proposed model.