Collaborating Authors

Determining the Veracity of Rumours on Twitter Machine Learning

While social networks can provide an ideal platform for up-to-date information from individuals across the world, it has also proved to be a place where rumours fester and accidental or deliberate misinformation often emerges. In this article, we aim to support the task of making sense from social media data, and specifically, seek to build an autonomous message-classifier that filters relevant and trustworthy information from Twitter. For our work, we collected about 100 million public tweets, including users' past tweets, from which we identified 72 rumours (41 true, 31 false). We considered over 80 trustworthiness measures including the authors' profile and past behaviour, the social network connections (graphs), and the content of tweets themselves. We ran modern machine-learning classifiers over those measures to produce trustworthiness scores at various time windows from the outbreak of the rumour. Such time-windows were key as they allowed useful insight into the progression of the rumours. From our findings, we identified that our model was significantly more accurate than similar studies in the literature. We also identified critical attributes of the data that give rise to the trustworthiness scores assigned. Finally we developed a software demonstration that provides a visual user interface to allow the user to examine the analysis.

An Information Diffusion Approach to Rumor Propagation and Identification on Twitter Machine Learning

With the increasing use of online social networks as a source of news and information, the propensity for a rumor to disseminate widely and quickly poses a great concern, especially in disaster situations where users do not have enough time to fact-check posts before making the informed decision to react to a post that appears to be credible. In this study, we explore the propagation pattern of rumors on Twitter by exploring the dynamics of microscopic-level misinformation spread, based on the latent message and user interaction attributes. We perform supervised learning for feature selection and prediction. Experimental results with real-world data sets give the models' prediction accuracy at about 90\% for the diffusion of both True and False topics. Our findings confirm that rumor cascades run deeper and that rumor masked as news, and messages that incite fear, will diffuse faster than other messages. We show that the models for True and False message propagation differ significantly, both in the prediction parameters and in the message features that govern the diffusion. Finally, we show that the diffusion pattern is an important metric in identifying the credibility of a tweet.

Why Do You Write This? Prediction of Influencers from Word Use

AAAI Conferences

With the widespread usage of social media, there has been much effort on detecting influential users for different in-formation propagation applications. An inherent limitation of such methods is that they can only detect influential users after such users show observable signals of influence. How-ever, in many real world applications including a counter-campaign, an organization needs a way to detect influencers early, so that they can take appropriate measure before it is too late to intervene. In this work, we present a method to detect such would-be influencers from their prior word us-age in social media. We compute psycholinguistic category scores from word usage, and investigate how people with different scores exhibited different influence behaviors on Twitter. We also found psycholinguistic categories that show significant correlations with such behaviors, and built predictive models of influence from such category based features. Our experiments using a real world dataset vali-dates that such predictions can be done with reasonable ac-curacy.

Ginger Cannot Cure Cancer: Battling Fake Health News with a Comprehensive Data Repository Machine Learning

Nowadays, Internet is a primary source of attaining health information. Massive fake health news which is spreading over the Internet, has become a severe threat to public health. Numerous studies and research works have been done in fake news detection domain, however, few of them are designed to cope with the challenges in health news. For instance, the development of explainable is required for fake health news detection. To mitigate these problems, we construct a comprehensive repository, FakeHealth, which includes news contents with rich features, news reviews with detailed explanations, social engagements and a user-user social network. Moreover, exploratory analyses are conducted to understand the characteristics of the datasets, analyze useful patterns and validate the quality of the datasets for health fake news detection. We also discuss the novel and potential future research directions for the health fake news detection.

Analyzing Political Sentiment on Twitter

AAAI Conferences

Due to the vast amount of user-generated content in the emerging Web 2.0, there is a growing need for computational processing of sentiment analysis in documents. Most of the current research in this field is devoted to product reviews from websites. Microblogs and social networks pose even a greater challenge to sentiment classification. However, especially marketing and political campaigns leverage from opinions expressed on Twitter or other social communication platforms. The objects of interest in this paper are the presidential candidates of the Republican Party in the USA and their campaign topics. In this paper we introduce the combination of the noun phrases’ frequency and their PMI measure as constraint on aspect extraction. This compensates for sparse phrases receiving a higher score than those composed of high-frequency words. Evaluation shows that the meronymy relationship between politicians and their topics holds and improves accuracy of aspect extraction.