Asia
An Association Network for Computing Semantic Relatedness
Zhang, Keyang (Shanghai Jiao Tong University) | Zhu, Kenny (Shanghai Jiao Tong University) | Hwang, Seung-won (POSTECH)
To judge how much a pair of words (or texts) are semantically related is acognitive process. However, previous algorithms for computing semanticrelatedness are largely based on co-occurrences within textualwindows, and do not actively leverage cognitive human perceptions ofrelatedness. To bridge this perceptional gap, we propose to utilizefree association as signals to capture such human perceptions.However, free association, being manually evaluated,has limited lexical coverage and is inherently sparse. We propose to expand lexical coverage and overcome sparseness by constructing an association network of terms and concepts that combines signals from free association norms and five types of co-occurrences extracted from therich structures of Wikipedia. Our evaluation results validate thatsimple algorithms on this network give competitive results incomputing semantic relatedness between words and between shorttexts.
Learning User-Specific Latent Influence and Susceptibility from Information Cascades
Wang, Yongqing (Institute of Computing Technology, Chinese Academy of Sciences) | Shen, Huawei (Institute of Computing Technology, Chinese Academy of Sciences) | Liu, Shenghua (Institute of Computing Technology, Chinese Academy of Sciences) | Cheng, Xueqi (Institute of Computing Technology, Chinese Academy of Sciences)
Predicting cascade dynamics has important implications for understanding information propagation and launching viral marketing. Previous works mainly adopt a pair-wise manner, modeling the propagation probability between pairs of users using n 2 independent parameters for n users. Consequently, these models suffer from severe overfitting problem, especially for pairs of users without direct interactions, limiting their prediction accuracy. Here we propose to model the cascade dynamics by learning two low-dimensional user-specific vectors from observed cascades, capturing their influence and susceptibility respectively. This model requires much less parameters and thus could combat overfitting problem. Moreover, this model could naturally model context-dependent factors like cumulative effect in information propagation. Extensive experiments on synthetic dataset and a large-scale microblogging dataset demonstrate that this model outperforms the existing pair-wise models at predicting cascade dynamics, cascade size, and "who will be retweeted."
Personalized Tag Recommendation through Nonlinear Tensor Factorization Using Gaussian Kernel
Fang, Xiaomin (Sun Yat-sen University) | Pan, Rong (Sun Yat-sen University) | Cao, Guoxiang (Huawei Technologies Co. Ltd) | He, Xiuqiang (Huawei Technologies Co. Ltd) | Dai, Wenyuan (Huawei Technologies Co. Ltd)
Personalized tag recommendation systems recommend a list of tags to a user when he is about to annotate an item. It exploits the individual preference and the characteristic of the items. Tensor factorization tech- niques have been applied to many applications, such as tag recommendation. Models based on Tucker Decomposition can achieve good performance but require a lot of computation power. On the other hand, mod- els based on Canonical Decomposition can run in linear time and are more feasible for online recommendation. In this paper, we propose a novel method for personalized tag recommendation, which can be considered as a nonlinear extension of Canonical Decomposition. Different from linear tensor factorization, we exploit Gaussian radial basis function to increase the modelโs capacity. The experimental results show that our proposed method outperforms the state-of-the-art methods for tag recommendation on real datasets and perform well even with a small number of features, which verifies that our models can make better use of features.
Representation Learning for Aspect Category Detection in Online Reviews
Zhou, Xinjie (Peking University) | Wan, Xiaojun (Peking University) | Xiao, Jianguo (Peking University)
User-generated reviews are valuable resources for decision making. Identifying the aspect categories discussed in a given review sentence (e.g., โfoodโ and โserviceโ in restaurant reviews) is an important task of sentiment analysis and opinion mining. Given a predefined aspect category set, most previous researches leverage hand-crafted features and a classification algorithm to accomplish the task. The crucial step to achieve better performance is feature engineering which consumes much human effort and may be unstable when the product domain changes. In this paper, we propose a representation learning approach to automatically learn useful features for aspect category detection. Specifically, a semi-supervised word embedding algorithm is first proposed to obtain continuous word representations on a large set of reviews with noisy labels. Afterwards, we propose to generate deeper and hybrid features through neural networks stacked on the word vectors. A logistic regression classifier is finally trained with the hybrid features to predict the aspect category. The experiments are carried out on a benchmark dataset released by SemEval-2014. Our approach achieves the state-of-the-art performance and outperforms the best participating team as well as a few strong baselines.
Retweet Behavior Prediction Using Hierarchical Dirichlet Process
Zhang, Qi (Fudan University) | Gong, Yeyun (Fudan University) | Guo, Ya (Fudan University) | Huang, Xuanjing (Fudan University)
The task of predicting retweet behavior is an important and essential step for various social network applications, such as business intelligence, popular event prediction, and so on. Due to the increasing requirements, in recent years, the task has attracted extensive attentions. In this work, we propose a novel method using non-parametric statistical models to combine structural, textual, and temporal information together to predict retweet behavior. To evaluate the proposed method, we collect a large number of microblogs and their corresponding social networks from a real microblog service. Experimental results on the constructed dataset demonstrate that the proposed method can achieve better performance than state-of-the-art methods. The relative improvement of the the proposed over the method using only textual information is more than 38.5% in terms of F1-Score.
Incorporating Implicit Link Preference Into Overlapping Community Detection
Zhang, Hongyi (The Chinese University of Hong Kong) | King, Irwin (The Chinese University of Hong Kong) | Lyu, Michael R. (The Chinese University of Hong Kong)
Community detection is an important technique to understand structures and patterns in complex networks. Recently, overlapping community detection becomes a trend due to the ubiquity of overlapping and nested communities in real world. However, existing approaches have ignored the use of implicit link preference information, i.e., links can reflect a node's preference on the targets of connections it wants to build. This information has strong impact on community detection since a node prefers to build links with nodes inside its community than those outside its community. In this paper, we propose a preference-based nonnegative matrix factorization (PNMF) model to incorporate implicit link preference information. Unlike conventional matrix factorization approaches, which simply approximate the original adjacency matrix in value, our model maximizes the likelihood of the preference order for each node by following the intuition that a node prefers its neighbors than other nodes. Our model overcomes the indiscriminate penalty problem in which non-linked pairs inside one community are equally penalized in objective functions as those across two communities. We propose a learning algorithm which can learn a node-community membership matrix via stochastic gradient descent with bootstrap sampling. We evaluate our PNMF model on several real-world networks. Experimental results show that our model outperforms state-of-the-art approaches and can be applied to large datasets.
Collaborative Topic Ranking: Leveraging Item Meta-Data for Sparsity Reduction
Yao, Weilong (University of Chinese Academy of Sciences) | He, Jing (Victoria University) | Wang, Hua (Victoria University) | Zhang, Yanchun (Victoria University) | Cao, Jie (Nanjing University of Finance and Economics)
Pair-wise ranking methods have been widely used in recommender systems to deal with implicit feedback. They attempt to discriminate between a handful of observed items and the large set of unobserved items. In these approaches, however, user preferences and item characteristics cannot be estimated reliably due to overfitting given highly sparse data. To alleviate this problem, in this paper, we propose a novel hierarchical Bayesian framework which incorporates ``bag-of-words'' type meta-data on items into pair-wise ranking models for one-class collaborative filtering. The main idea of our method lies in extending the pair-wise ranking with a probabilistic topic modeling. Instead of regularizing item factors through a zero-mean Gaussian prior, our method introduces item-specific topic proportions as priors for item factors. As a by-product, interpretable latent factors for users and items may help explain recommendations in some applications. We conduct an experimental study on a real and publicly available dataset, and the results show that our algorithm is effective in providing accurate recommendation and interpreting user factors and item factors.
RAIN: Social Role-Aware Information Diffusion
Yang, Yang (Tsinghua University) | Tang, Jie (Tsinghua University) | Leung, Cane Wing-ki (Huawei's Noah's Ark Lab) | Sun, Yizhou (Northeastern University) | Chen, Qicong (Tsinghua University) | Li, Juanzi (Tsinghua University) | Yang, Qiang (Huawei Noah's Ark Lab)
Information diffusion, which studies how information is propagated in social networks, has attracted considerable research effort recently. However, most existing approaches do not distinguish social roles that nodes may play in the diffusion process. In this paper, we study the interplay between users' social roles and their influence on information diffusion. We propose a Role-Aware INformation diffusion model (RAIN) that integrates social role recognition and diffusion modeling into a unified framework. We develop a Gibbs-sampling based algorithm to learn the proposed model using historical diffusion data. The proposed model can be applied to different scenarios. For instance, at the micro-level, the proposed model can be used to predict whether an individual user will repost a specific message; while at the macro-level, we can use the model to predict the scale and the duration of a diffusion process. We evaluate the proposed model on a real social media data set. Our model performs much better in both micro- and macro-level prediction than several alternative methods.
A Probabilistic Model for Bursty Topic Discovery in Microblogs
Yan, Xiaohui (Institute of Computing Technology, Chinese Academy of Science) | Guo, Jiafeng (Institute of Computing Technology, Chinese Academy of Science) | Lan, Yanyan (Institute of Computing Technology, Chinese Academy of Science) | Xu, Jun (Institute of Computing Technology, Chinese Academy of Science) | Cheng, Xueqi (Institute of Computing Technology, Chinese Academy of Science)
Bursty topics discovery in microblogs is important for people to grasp essential and valuable information. However, the task is challenging since microblog posts are particularly short and noisy. This work develops a novel probabilistic model, namely Bursty Biterm Topic Model (BBTM), to deal with the task. BBTM extends the Biterm Topic Model (BTM) by incorporating the burstiness of biterms as prior knowledge for bursty topic modeling, which enjoys the following merits: 1) It can well solve the data sparsity problem in topic modeling over short texts as the same as BTM; 2) It can automatical discover high quality bursty topics in microblogs in a principled and efficient way. Extensive experiments on a standard Twitter dataset show that our approach outperforms the state-of-the-art baselines significantly.
DynaDiffuse: A Dynamic Diffusion Model for Continuous Time Constrained Influence Maximization
Xie, Miao (University of Chinese Academy of Sciences) | Yang, Qiusong (Institute of Software, Chinese Academy of Sciences) | Wang, Qing (Institute of Software, Chinese Academy of Sciences) | Cong, Gao (Nanyang Technological University) | Melo, Gerard de (Tsinghua University/Microsoft Research Asia)
Studying the spread of phenomena in social networks is critical but still not fully solved. Existing influence maximization models assume a static network, disregarding its evolution over time. We introduce the continuous time constrained influence maximization problem for dynamic diffusion networks, based on a novel diffusion model called DynaDiffuse. Although the problem is NP-hard, the influence spread functions are monotonic and submodular, enabling fast approximations on top of an innovative stochastic model checking approach. Experiments on real social network data show that our model finds higher quality solutions and our algorithm outperforms state-of-art alternatives.