Goto

Collaborating Authors

 Country


What's in a @name? How Name Value Biases Judgment of Microblog Authors

AAAI Conferences

Bias can be defined as selective favoritism exhibited by human beings when posed with a task of decision making across multiple options. Online communities present plenty of decision making opportunities to their users. Users exhibit biases in their attachments, voting and ratings and other tasks of decision making. We study bias amongst microblog users due to the value of an author's name. We describe the relationship between name value bias and number of followers, and cluster authors and readers based on patterns of bias they receive and exhibit, respectively. For authors we show that content from known names (e.g., @CNN) is rated artificially high, while content from unknown names is rated artificially low. For readers, our results indicate that there are two types: slightly biased, heavily biased. A subsequent analysis of Twitter author names revealed attributes of names that underlie this bias, including effects for gender, type of name (individual versus organization), and degree of topical relevance. We discuss how our work can be instructive to content distributors and search engines in leveraging and presenting microblog content.


The Prevalence of Political Discourse in Non-Political Blogs

AAAI Conferences

Though political theorists have emphasized the importance of political discussion in non-political spaces, past study of online political discussion has focused on primarily political websites. Using a random sample from Blogger.com, we find that 25% of all political posts are from blogs that post about politics less than 20% of the time, because the vast majority of blogs post about politics some of the time but infrequently. Far from being taboo topics in those non- political blogs, political posts got slightly more comments than non-political posts in those same blogs, and the comments overwhelmingly engage the political topics of the post, mostly agreeing but frequently disagreeing as well. We argue that non-political spaces devoted primarily to personal diaries, hobbies, and other topics represent a substantial place of online political discussion and should be a site for further study.


Extracting Meta Statements from the Blogosphere

AAAI Conferences

Information extraction systems have been recently proposed for organizing and exploring content in large online text corpora as information networks . In such networks, the nodes are named entities (e.g., people, organizations) while the edges correspond to statements indicating relations among such entities. To date, such systems extract rather primitive networks, capturing only those relations which are expressed by direct statements. In many applications, it is useful to also extract more subtle relations which are often expressed as meta statements in the text. These can, for instance provide the context for a statement (e.g., โ€œGoogle acquired YouTube on October 2006โ€), or repercussion about a statement (e.g., โ€œThe US condemned Russiaโ€™s invasion of Georgiaโ€). In this work, we report on a system for extracting relations expressed in both direct statements as well as in meta statements. We propose a method based on Conditional Random Fields that explores syntactic features to extract both kinds of statements seamlessly. We follow the Open Information Extraction paradigm, where a classifier is trained to recognize any type of relation instead of specific ones. Finally, our results show substantial improvements over a state-of-the-art information extraction system, both in terms of accuracy and, especially, recall.


Task Specialization in Social Production Communities: The Case of Geographic Volunteer Work

AAAI Conferences

In social production communities, users' individual and collective efforts lead to the creation of valuable resources โ€” cf. Wikipedia, Open Street Map, and Reddit. Contributors to such communities often specialize in the tasks they choose to do. We found evidence for specialization by work type in Cyclopath, a geographic wiki for bicyclists -- most users edit a single type of map feature, such as points of interest or roads and trails. We also saw a user lifecycle effect: as users gain experience, they specialize in editing roads and trails. Our findings suggest more effective ways to organize social production interfaces, compose units of work, and match them to users who want to help.


Dimensions of Self-Expression in Facebook Status Updates

AAAI Conferences

We describe the dimensions along which Facebook users tend to express themselves via status updates using the semi-automated text analysis approach, the Meaning Extraction Method (MEM). First, we examined dimensions of self-expression in all status updates from a sample of four million Facebook users from four English-speaking countries (the United States, Canada, the United Kingdom, and Australia) in order to examine how these countries vary in their self-expressions. All four countries showed a basic three-component structure, indicating that the medium is a stronger influence than country characteristics or demographics on how people use Facebook status updates. In each country, people vary in terms of the extent to which they use Informal Speech, share Positive Events, and discuss School in their Facebook status updates. Together, these factors tell us how users differ in their self-expression, and thus illustrate meaningful use cases for the product: Talking about whatโ€™s going on tends to be positive, and people vary in terms of the extent to which their status updates are short, slangy emotional expressions and topics regarding school. The specific words that define these factors showed subtle differences across countries: The use of profanity indicates fewer school words (but only in Australia), whereas the UK shows greater use of slang terms (rather than profanity) when speaking informally. The MEM also identified English-language dialects as a meaningful dimension along which the countries varied. In sum, beyond simply indicating topicality of posts, this study provides insight into how status updates are used for self-expression. We discuss several theoretical frameworks that could produce these results, and more broadly discuss the generation of theoretical frameworks from wholly empirical data (such as naturalistic Internet speech) using the MEM.


Latent Set Models for Two-Mode Network Data

AAAI Conferences

Two-mode networks are a natural representation for many kinds of relational data. These networks are bipartite graphs consisting of two distinct sets ("modes") of entities. For example, one can model multiple recipient email data as a two-mode network of (a) individuals and (b) the emails that they send or receive. In this work we present a statistical model for two-mode network data which posits that individuals belong to latent sets and that the members of a particular set tend to co-appear. We show how to infer these latent sets from observed data using a Markov chain Monte Carlo inference algorithm. We apply the model to the Enron email corpus, using it to discover interpretable latent structure as well as evaluating its predictive accuracy on a missing data task. Extensions to the model are discussed that incorporate additional side information such as the email's sender or text content, further improving the accuracy of the model.


Modelling Action Cascades in Social Networks

AAAI Conferences

The central idea in designing various marketing strategies for online social networks is to identify the influencers in the network. The influential individuals induce ``word-of-mouth" effects in the network. These individuals are responsible for triggering long cascades of influence that convince their peers to perform a similar action (buying a product, for instance). Targeting these influentials usually leads to a vast spread of the information across the network. Hence it is important to identify such individuals in a network. One way to measure an individual's influencing capability on its peers is by its reach for a certain action. We formulate identifying the influencers in a network as a problem of predicting the average depth of cascades an individual can trigger. We first empirically identify factors that play crucial role in triggering long cascades. Based on the analysis, we build a model for predicting the cascades triggered by a user for an action. The model uses features like influencing capabilities of the user and their friends, influencing capabilities of the particular action and other user and network characteristics. Experiments show that the model effectively improves the predictions over several baselines.


Social Lens: Personalization Around User Defined Collections for Filtering Enterprise Message Streams

AAAI Conferences

Social media has led to a data explosion and has begun to play an ever increasing role as a valuable source of information and a mechanism for information discovery. The wealth of data highlights the need for methods to filter and sort information in order to allow users to discover useful information. Most traditional solutions focus on the user, either the user's social network, or a form of personalization based on collaborative filtering or predictive user modeling. This paper presents a novel algorithm to view information through a lens based on a user defined collection while excluding the attributes of the user from the analysis. As a result, the lens is transparent, tunable and sharable amongst users and, additionally allows both a reduction in information overload while discovering new related content.


Timing Tweets to Increase Effectiveness of Information Campaigns

AAAI Conferences

Microblogging websites such as Twitter are increasingly being used by businesses/campaigners for timely dissemination of information to their followers. The diffusion of a tweet depends on several factors: the activity of the follower nodes, the responsiveness of follower nodes to tweets from the source node, the out-degree of the follower nodes, the content of recent related tweets seen by the follower node, etc. Using such factors, in this paper, we propose a framework to measure the effectiveness of an information campaign over Twitter. We consider a positive as well as a negative metric to measure the impact of a tweet: while retweets are used to measure the positive impact, the lack of a timely response from an active follower node is taken as a potential negative impact. We investigate the scheduling of tweets to increase the net positive impact while keeping the net negative impact below a desired level. We propose and study several scheduling algorithms by casting the problem in a Markov Decision Process (MDP) framework. In order to compare our algorithms, we estimate the model parameters from tweet data collected using the Twitter API from an arbitrarily selected node and its 6837 followers over several months. For this dataset, we find that if successive tweets in the campaign are novel, then substantial gains over user activity based scheduling can be obtained by scheduling tweets in time slots where the ratio of the expected positive and negative metrics is high. We call this the MaxRatio policy and we show that it is optimal under certain conditions. In cases where we are not certain about the response of users to successive related tweets, we identify another algorithm (which we call MaxReach) as a robust alternative.


Political Polarization on Twitter

AAAI Conferences

In this study we investigate how social media shape the networked public sphere and facilitate communication between communities with different political orientations. We examine two networks of political communication on Twitter, comprised of more than 250,000 tweets from the six weeks leading up to the 2010 U.S. congressional midterm elections. Using a combination of network clustering algorithms and manually-annotated data we demonstrate that the network of political retweets exhibits a highly segregated partisan structure, with extremely limited connectivity between left- and right-leaning users. Surprisingly this is not the case for the user-to-user mention network, which is dominated by a single politically heterogeneous cluster of users in which ideologically-opposed individuals interact at a much higher rate compared to the network of retweets. To explain the distinct topologies of the retweet and mention networks we conjecture that politically motivated individuals provoke interaction by injecting partisan content into information streams whose primary audience consists of ideologically-opposed users. We conclude with statistical evidence in support of this hypothesis.