Plotting

 Country


Timing Tweets to Increase Effectiveness of Information Campaigns

AAAI Conferences

Microblogging websites such as Twitter are increasingly being used by businesses/campaigners for timely dissemination of information to their followers. The diffusion of a tweet depends on several factors: the activity of the follower nodes, the responsiveness of follower nodes to tweets from the source node, the out-degree of the follower nodes, the content of recent related tweets seen by the follower node, etc. Using such factors, in this paper, we propose a framework to measure the effectiveness of an information campaign over Twitter. We consider a positive as well as a negative metric to measure the impact of a tweet: while retweets are used to measure the positive impact, the lack of a timely response from an active follower node is taken as a potential negative impact. We investigate the scheduling of tweets to increase the net positive impact while keeping the net negative impact below a desired level. We propose and study several scheduling algorithms by casting the problem in a Markov Decision Process (MDP) framework. In order to compare our algorithms, we estimate the model parameters from tweet data collected using the Twitter API from an arbitrarily selected node and its 6837 followers over several months. For this dataset, we find that if successive tweets in the campaign are novel, then substantial gains over user activity based scheduling can be obtained by scheduling tweets in time slots where the ratio of the expected positive and negative metrics is high. We call this the MaxRatio policy and we show that it is optimal under certain conditions. In cases where we are not certain about the response of users to successive related tweets, we identify another algorithm (which we call MaxReach) as a robust alternative.


Reconstruction of Threaded Conversations in Online Discussion Forums

AAAI Conferences

Online discussion boards, or Internet forums, are a significant part of the Internet. People use Internet forums to post questions, provide advice and participate in discussions. These online conversations are represented as threads, and the conversation trees within these threads are important in understanding the behaviour of online users. Unfortunately, the reply structures of these threads are generally not publicly accessible or not maintained. Hence, in this paper, we introduce an efficient and simple approach to reconstruct the reply structure in threaded conversations. We contrast its accuracy against three baseline algorithms, and show that our algorithm can accurately recreate the in and out degree distributions of forum reply graphs built from the reconstructed reply structures.


Using the H-Index to Estimate Blog Authority

AAAI Conferences

Link analysis is a technique frequently used in the ranking of web sites. On the web, we often encounter content that is organized by entries, sorted from recent to old, and generally follows the structure of a blog. In this paper we explore and evaluate the usage of a bibliometrics measure, called h-index, for the task of blog ranking, in an information retrieval context. We base our experiments on the TREC Blogs08 collection, which comprises over 28 million posts. The results obtained indicate that the h-index is a robust metric that allows for an improved relevance discrimination between blogs, when compared to the in-degree. Additionally, tests performed using distinct versions of the post graph, indicate that this metric might tolerate a certain level of link clutter.


“Dancing with the Stars,” NBA Games, Politics: An Exploration of Twitter Users’ Response to Events

AAAI Conferences

Microblogging services such as Twitter offer great opportunities for analyzing the reactions of a wide audience with respect to current events. In this paper, we explore the correlation between types of user engagement and events centered around celebrities (e.g., personal or professional events involving Actors, Musicians, Politicians, Athletes).


Natural Language Processing to the Rescue? Extracting "Situational Awareness" Tweets During Mass Emergency

AAAI Conferences

In times of mass emergency, vast amounts of data are generated via computer-mediated communication (CMC) that are difficult to manually cull and organize into a coherent picture. Yet valuable information is broadcast, and can provide useful insight into time- and safety-critical situations if captured and analyzed properly and rapidly. We describe an approach for automatically identifying messages communicated via Twitter that contribute to situational awareness, and explain why it is beneficial for those seeking information during mass emergencies. We collected Twitter messages from four different crisis events of varying nature and magnitude and built a classifier to automatically detect messages that may contribute to situational awareness, utilizing a combination of hand-annotated and automatically-extracted linguistic features. Our system was able to achieve over 80% accuracy on categorizing tweets that contribute to situational awareness. Additionally, we show that a classifier developed for a specific emergency event performs well on similar events. The results are promising, and have the potential to aid the general public in culling and analyzing information communicated during times of mass emergency.


LeadLag LDA: Estimating Topic Specific Leads and Lags of Information Outlets

AAAI Conferences

Identifying which outlet in social media leads the rest in disseminating novel information on specific topics is an interesting challenge for information analysts and social scientists. In this work, we hypothesize that novel ideas are disseminated through the creation and propagation of new or newly emphasized key words, and therefore lead/lag of outlets can be estimated by tracking word usage across these outlets. First, we demonstrate the validaty of our hypothesis by showing that a simple TF-IDF based nearest-neighbors approach can recover generally accepted lead/lag behavior on the outlets pair of ACM journal articles and conference papers. Next, we build a new topic model called LeadLag LDA that estimates the lead/lag of the outlets on specific topics. We validate the topic model using the lead/lag results from the TF-IDF nearest neighbors approach. Finally, we present results from our model on two different outlet pairs of blogs vs. news media and grant proposals vs. research publications that reveal interesting patterns.


Socio-Spatial Properties of Online Location-Based Social Networks

AAAI Conferences

The spatial structure of large-scale online social networks has been largely unaccessible due to the lack of available and accurate data about people’s location. However, with the recent surging popularity of location-based social services, data about the geographic position of users have been available for the first time, together with their online social connections. In this work we present a comprehensive study of the spatial properties of the social networks arising among users of three main popular online location-based services. We observe robust universal features across them: while all networks exhibit about 40% of links below 100 km, we further discover strong heterogeneity across users, with different characteristic spatial lengths of interaction across both their social ties and social triads. We provide evidence that mechanisms akin to gravity models may influence how these social connections are created over space. Our results constitute the first large-scale study to unravel the socio-spatial properties of online location-based social networks.


Memes Online: Extracted, Subtracted, Injected, and Recollected

AAAI Conferences

Social media is playing an increasingly vital role in information dissemination. But with dissemination being more distributed, content often makes multiple hops, and consequently has opportunity to change. In this paper we focus on content that should be changing the least, namely quoted text. We find changes to be frequent, with their likelihood depending on the authority of the copied source and the type of site that is copying. We uncover patterns in the rate of appearance of new variants, their length, and popularity, and develop a simple model that is able to capture them. These patterns are distinct from ones produced when all copies are made from the same source, suggesting that information is evolving as it is being processed collectively in online social media.


Beyond Trending Topics: Real-World Event Identification on Twitter

AAAI Conferences

User-contributed messages on social media sites such as Twitter have emerged aspowerful, real-time means of information sharing on the Web. These short messages tend to reflect a variety of events in real time, making Twitter particularly well suited as a source of real-time event content. In this paper, we explore approaches for analyzing the stream of Twitter messages to distinguish between messages about real-world events andnon-event messages. Our approach relies on a rich family of aggregatestatistics of topically similar message clusters. Large-scale experiments over millions of Twitter messages show the effectiveness of our approach for surfacing real-world event content on Twitter.


Classifying the Political Leaning of News Articles and Users from User Votes

AAAI Conferences

Social news aggregator services generate readers’ subjective reactions to news opinion articles. Can we use those as a resource to classify articles as liberal or conservative, even without knowing the self-identified political leaning of most users? We applied three semi-supervised learning methods that propagate classifications of political news articles and users as conservative or liberal, based on the assumption that liberal users will vote for liberal articles more often, and similarly for conservative users and articles. Starting from a few labeled articles and users, the algorithms propagate political leaning labels to the entire graph. In cross-validation, the best algorithm achieved 99.6% accuracy on held-out users and 96.3% accuracy on held-out articles. Adding social data such as users’ friendship or text features such as cosine similarity did not improve accuracy. The propagation algorithms, using the subjective liking data from users, also performed better than an SVM based text classifier, which achieved 92.0% accuracy on articles.