Ant Financial Services Group
How Images Inspire Poems: Generating Classical Chinese Poetry from Images with Memory Networks
Xu, Linli (University of Science and Technology of China) | Jiang, Liang ( University of Science and Technology of China ) | Qin, Chuan (University of Science and Technology of China) | Wang, Zhe (Ant Financial Services Group) | Du, Dongfang (University of Science and Technology of China)
With the recent advances of neural models and natural language processing, automatic generation of classical Chinese poetry has drawn significant attention due to its artistic and cultural value. Previous works mainly focus on generating poetry given keywords or other text information, while visual inspirations for poetry have been rarely explored. Generating poetry from images is much more challenging than generating poetry from text, since images contain very rich visual information which cannot be described completely using several keywords, and a good poem should convey the image accurately. In this paper, we propose a memory based neural model which exploits images to generate poems. Specifically, an Encoder-Decoder model with a topic memory network is proposed to generate classical Chinese poetry from images. To the best of our knowledge, this is the first work attempting to generate classical Chinese poetry from images with neural networks. A comprehensive experimental investigation with both human evaluation and quantitative analysis demonstrates that the proposed model can generate poems which convey images accurately.
Privacy Preserving Point-of-Interest Recommendation Using Decentralized Matrix Factorization
Chen, Chaochao (Ant Financial Services Group) | Liu, Ziqi (Ant Financial Services Group) | Zhao, Peilin (Ant Financial Services Group) | Zhou, Jun (Ant Financial Services Group) | Li, Xiaolong (Ant Financial Services Group)
Points of interest (POI) recommendation has been drawn much attention recently due to the increasing popularity of location-based networks, e.g., Foursquare and Yelp. Among the existing approaches to POI recommendation, Matrix Factorization (MF) based techniques have proven to be effective. However, existing MF approaches suffer from two major problems: (1) Expensive computations and storages due to the centralized model training mechanism: the centralized learners have to maintain the whole user-item rating matrix, and potentially huge low rank matrices. (2) Privacy issues: the users' preferences are at risk of leaking to malicious attackers via the centralized learner. To solve these, we present a Decentralized MF (DMF) framework for POI recommendation. Specifically, instead of maintaining all the low rank matrices and sensitive rating data for training, we propose a random walk based decentralized training technique to train MF models on each user's end, e.g., cell phone and Pad. By doing so, the ratings of each user are still kept on one's own hand, and moreover, decentralized learning can be taken as distributed learning with multi-learners (users), and thus alleviates the computation and storage issue. Experimental results on two real-world datasets demonstrate that, comparing with the classic and state-of-the-art latent factor models, DMF significantly improvements the recommendation performance in terms of precision and recall.
cw2vec: Learning Chinese Word Embeddings with Stroke n-gram Information
Cao, Shaosheng (Ant Financial Services Group; Singapore University of Technology and Design) | Lu, Wei (Singapore University of Technology and Design) | Zhou, Jun (Ant Financial Services Group) | Li, Xiaolong (Ant Financial Services Group)
We propose cw2vec, a novel method for learning Chinese word embeddings. It is based on our observation that exploiting stroke-level information is crucial for improving the learning of Chinese word embeddings. Specifically, we design a minimalist approach to exploit such features, by using stroke n-grams, which capture semantic and morphological level information of Chinese words. Through qualitative analysis, we demonstrate that our model is able to extract semantic information that cannot be captured by existing methods. Empirical results on the word similarity, word analogy, text classification and named entity recognition tasks show that the proposed approach consistently outperforms state-of-the-art approaches such as word-based word2vec and GloVe, character-based CWE, component-based JWE and pixel-based GWE.
Reading the Videos: Temporal Labeling for Crowdsourced Time-Sync Videos Based on Semantic Embedding
Lv, Guangyi (University of Science and Technology of China) | Xu, Tong (University of Science and Technology of China) | Chen, Enhong (University of Science and Technology of China) | Liu, Qi (University of Science and Technology of China) | Zheng, Yi (Ant Financial Services Group)
Recent years have witnessed the boom of online sharing media contents, which raise significant challenges in effective management and retrieval. Though a large amount of efforts have been made, precise retrieval on video shots with certain topics has been largely ignored. At the same time, due to the popularity of novel time-sync comments, or so-called "bullet-screen comments", video semantics could be now combined with timestamps to support further research on temporal video labeling. In this paper, we propose a novel video understanding framework to assign temporal labels on highlighted video shots. To be specific, due to the informal expression of bullet-screen comments, we first propose a temporal deep structured semantic model (T-DSSM) to represent comments into semantic vectors by taking advantage of their temporal correlation. Then, video highlights are recognized and labeled via semantic vectors in a supervised way. Extensive experiments on a real-world dataset prove that our framework could effectively label video highlights with a significant margin compared with baselines, which clearly validates the potential of our framework on video understanding, as well as bullet-screen comments interpretation.