Institute of Computing Technology, Chinese Academy of Sciences
News Verification by Exploiting Conflicting Social Viewpoints in Microblogs
Jin, Zhiwei (Institute of Computing Technology, Chinese Academy of Sciences) | Cao, Juan (Institute of Computing Technology, Chinese Academy of Sciences) | Zhang, Yongdong (Institute of Computing Technology, Chinese Academy of Sciences) | Luo, Jiebo (University of Rochester)
Fake news spreading in social media severely jeopardizes the veracity of online content. Fortunately, with the interactive and open features of microblogs, skeptical and opposing voices against fake news always arise along with it. The conflicting information, ignored by existing studies, is crucial for news verification. In this paper, we take advantage of this "wisdom of crowds" information to improve news verification by mining conflicting viewpoints in microblogs. First, we discover conflicting viewpoints in news tweets with a topic model method. Based on identified tweets' viewpoints, we then build a credibility propagation network of tweets linked with supporting or opposing relations. Finally, with iterative deduction, the credibility propagation on the network generates the final evaluation result for news. Experiments conducted on a real-world data set show that the news verification performance of our approach significantly outperforms those of the baseline approaches.
Inside Out: Two Jointly Predictive Models for Word Representations and Phrase Representations
Sun, Fei (Institute of Computing Technology, Chinese Academy of Sciences) | Guo, Jiafeng (Institute of Computing Technology, Chinese Academy of Sciences) | Lan, Yanyan (Institute of Computing Technology, Chinese Academy of Sciences) | Xu, Jun (Institute of Computing Technology, Chinese Academy of Sciences) | Cheng, Xueqi (Institute of Computing Technology, Chinese Academy of Sciences)
Distributional hypothesis lies in the root of most existing word representation models by inferring word meaning from its external contexts. However, distributional models cannot handle rare and morphologically complex words very well and fail to identify some fine-grained linguistic regularity as they are ignoring the word forms. On the contrary, morphology points out that words are built from some basic units, i.e., morphemes. Therefore, the meaning and function of such rare words can be inferred from the words sharing the same morphemes, and many syntactic relations can be directly identified based on the word forms. However, the limitation of morphology is that it cannot infer the relationship between two words that do not share any morphemes. Considering the advantages and limitations of both approaches, we propose two novel models to build better word representations by modeling both external contexts and internal morphemes in a jointly predictive way, called BEING and SEING. These two models can also be extended to learn phrase representations according to the distributed morphology theory. We evaluate the proposed models on similarity tasks and analogy tasks. The results demonstrate that the proposed models can outperform state-of-the-art models significantly on both word and phrase representation learning.
Predicting Links and Their Building Time: A Path-Based Approach
Li, Manling (Institute of Computing Technology, Chinese Academy of Sciences) | Jia, Yantao (Institute of Computing Technology, Chinese Academy of Sciences) | Wang, Yuanzhuo (Institute of Computing Technology, Chinese Academy of Sciences) | Zhao, Zeya (Institute of Computing Technology, Chinese Academy of Sciences) | Cheng, Xueqi (Institute of Computing Technology, Chinese Academy of Sciences)
Predicting links and their building time in a knowledge network has been extensively studied in recent years. Most structure-based predictive methods consider structures and the time information of edges separately, which fail to characterize the correlation between them. In this paper, we propose a structure called the Time-Difference-Labeled Path, and a link prediction method (TDLP). Experiments show that TDLP outperforms the state-of-the-art methods.
SPAN: Understanding a Question with Its Support Answers
Pang, Liang (Institute of Computing Technology, Chinese Academy of Sciences) | Lan, Yanyan (Institute of Computing Technology, Chinese Academy of Sciences) | Guo, Jiafeng (Institute of Computing Technology, Chinese Academy of Sciences) | Xu, Jun (Institute of Computing Technology, Chinese Academy of Sciences) | Cheng, Xueqi (Institute of Computing Technology, Chinese Academy of Sciences)
Matching a question to its best answer is a common task in community question answering. In this paper, we focus on the non-factoid questions and aim to pick out the best answer from its candidate answers. Most of the existing deep models directly measure the similarity between question and answer by their individual sentence embeddings. In order to tackle the problem of the information lack in question's descriptions and the lexical gap between questions and answers, we propose a novel deep architecture namely SPAN in this paper. Specifically we introduce support answers to help understand the question, which are defined as the best answers of those similar questions to the original one. Then we can obtain two kinds of similarities, one is between question and the candidate answer, and the other one is between support answers and the candidate answer. The matching score is finally generated by combining them. Experiments on Yahoo! Answers demonstrate that SPAN can outperform the baseline models.
Text Matching as Image Recognition
Pang, Liang (Chinese Academy of Sciences) | Lan, Yanyan (Chinese Academy of Sciences) | Guo, Jiafeng (Chinese Academy of Sciences) | Xu, Jun (Chinese Academy of Sciences) | Wan, Shengxian (Institute of Computing Technology, Chinese Academy of Sciences) | Cheng, Xueqi (Institute of Computing Technology, Chinese Academy of Sciences)
Matching two texts is a fundamental problem in many natural language processing tasks. An effective way is to extract meaningful matching patterns from words, phrases, and sentences to produce the matching score. Inspired by the success of convolutional neural network in image recognition, where neurons can capture many complicated patterns based on the extracted elementary visual patterns such as oriented edges and corners, we propose to model text matching as the problem of image recognition. Firstly, a matching matrix whose entries represent the similarities between words is constructed and viewed as an image. Then a convolutional neural network is utilized to capture rich matching patterns in a layer-by-layer way. We show that by resembling the compositional hierarchies of patterns in image recognition, our model can successfully identify salient signals such as n-gram and n-term matchings. Experimental results demonstrate its superiority against the baselines.
Learning User-Specific Latent Influence and Susceptibility from Information Cascades
Wang, Yongqing (Institute of Computing Technology, Chinese Academy of Sciences) | Shen, Huawei (Institute of Computing Technology, Chinese Academy of Sciences) | Liu, Shenghua (Institute of Computing Technology, Chinese Academy of Sciences) | Cheng, Xueqi (Institute of Computing Technology, Chinese Academy of Sciences)
Predicting cascade dynamics has important implications for understanding information propagation and launching viral marketing. Previous works mainly adopt a pair-wise manner, modeling the propagation probability between pairs of users using n 2 independent parameters for n users. Consequently, these models suffer from severe overfitting problem, especially for pairs of users without direct interactions, limiting their prediction accuracy. Here we propose to model the cascade dynamics by learning two low-dimensional user-specific vectors from observed cascades, capturing their influence and susceptibility respectively. This model requires much less parameters and thus could combat overfitting problem. Moreover, this model could naturally model context-dependent factors like cumulative effect in information propagation. Extensive experiments on synthetic dataset and a large-scale microblogging dataset demonstrate that this model outperforms the existing pair-wise models at predicting cascade dynamics, cascade size, and "who will be retweeted."