Country
Happy, Nervous or Surprised? Classification of Human Affective States in Social Media
Choudhury, Munmun De (Microsoft Research, Redmond) | Gamon, Michael (Microsoft Research, Redmond) | Counts, Scott (Microsoft Research, Redmond)
Sentiment classification has been a well-investigated research area in the computational linguistics community. However, most of the research is primarily focused on detecting simply the polarity in text, often needing extensive manual labeling of ground truth. Additionally, little attention has been directed towards a finer analysis of human moods and affective states. Motivated by research in psychology, we propose and develop a classifier of several human affective states in social media. Starting with about 200 moods, we utilize mechanical turk studies to derive naturalistic signals from posts shared on Twitter about a variety of affects of individuals. This dataset is then deployed in an affect classification task with promising results. Our findings indicate that different types of affect involve different emotional content and usage styles; hence the performance of the classifier on various affects can differ considerably.
Enhancing Event Descriptions through Twitter Mining
Tanev, Hristo (Joint Research Centre, European Commission) | Ehrmann, Maud (Joint Research Centre, European Commission) | Piskorski, Jakub (Frontex) | Zavarella, Vanni (Joint Research Centre, European Commission)
We describe a simple IR approach for linking news about events, detected by an event extraction system, to messages from Twitter (tweets). In particular, we explore several methods for creating event-specific queries for Twitter and provide a quantitative and qualitative evaluation of the relevance and usefulness of the information obtained from the tweets. We showed that methods based on utilization of word co-occurrence clustering, domain-specific keywords and named entity recognition improve the performance with respect to a basic approach.
Temporal Motifs Reveal the Dynamics of Editor Interactions in Wikipedia
Jurgens, David (University of California, Los Angeles and HRL Laboratories, LLC) | Lu, Tsai-Ching (HRL Laboratories, LLC)
Wikipedia is a collaborative setting with both combative and cooperative editing. We propose a new method for investigating the types of editor interactions using a novel representation of Wikipedia's revision history as a temporal, bipartite network with multiple node and edge types for users and revisions. From this representation we identify significant author interactions as network motifs and show how the motif types capture important, diverse editing behaviors. Two experiments demonstrate the further benefit of motifs. First, we demonstrate significant performance improvement over a purely revision-based analysis in classifying pages as combative or cooperative page by using motifs; and second we use motifs as a basis for analyzing trends in the dynamics of editor behavior to explain Wikipedia's content growth.
Feasibility Study on Detection of Transportation Information Exploiting Twitter as a Sensor
Sasaki, Kenta (Toshiba Corporation) | Nagano, Shinichi (Toshiba Corporation) | Ueno, Koji (Toshiba Corporation) | Cho, Kenta (Toshiba Corporation)
The concept of a smart community has recently been attracting great attention as a means of utilizing energy effectively. One of the modules constituting the smart community is an intelligent transportation system, in which various sensors track movements of people and vehicles in real time to optimize migration pathways or means. Social media have the potential to serve as sensors, since people often post transportation information on such media. This paper presents a feasibility study on detecting information, focusing on train status information, by exploiting Twitter as a sensor. We dealt with two issues: (1) for the ambiguity of textual information expressed in tweets, we utilized heuristic rules in text manipulation, and (2) for the differences in the numbers of tweets among train lines, we optimized parameter values in statistical analysis for each train line. The experimental results show that the F-measure of detecting the information was more than 0.85 and the time taken to detect the information was less than 4 minutes. As a result we confirmed the high potential of detecting transportation information through Twitter.
A Supervised Approach to Predict Company Acquisition with Factual and Topic Features Using Profiles and News Articles on TechCrunch
Xiang, Guang (Carnegie Mellon University) | Zheng, Zeyu (Carnegie Mellon University) | Wen, Miaomiao (Carnegie Mellon University) | Hong, Jason (Carnegie Mellon University) | Rose, Carolyn (Carnegie Mellon University) | Liu, Chao (Microsoft Research)
Merger and Acquisition (M&A) prediction has been an interesting and challenging research topic in the past a few decades. However, past work has only adopted numerical features in building models, and yet the valuable textual information from the great variety of social media sites has not been touched at all. To fully explore this information, we used the profiles and news articles for companies and people on TechCrunch, the leading and largest public database for the tech world, which anybody can edit. Specifically, we explored topic features via topic modeling techniques, as well as a set of other novel features of our design within a machine learning framework. We conducted experiments of the largest scale in the literature, and achieved a high true positive rate (TP) between 60% to 79.8% with a false positive rate (FP) mostly between 0% and 8.3% over company categories with a small number of missing attributes in the CrunchBase profiles.
Evolutionary Clustering and Analysis of User Behaviour in Online Forums
Morrison, Donn (Digital Enterprise Research Institute) | McLoughlin, Ian (Digital Enterprise Research Institute) | Hogan, Alice (Digital Enterprise Research Institute) | Hayes, Conor (Digital Enterprise Research Institute)
In this paper we cluster and analyse temporal user behaviour in online communities. We adapt a simple unsupervised clustering algorithm to an evolutionary setting where we cluster users into prototypical behavioural roles based on features derived from their ego-centric reply-graphs. We then analyse changes in the role membership of the users over time, the change in role composition of forums over time and examine the differences between forums in terms of role composition. We perform this analysis on 200 forums from a popular national bulletin board and 14 enterprise technical support forums.
The YouTube Social Network
Wattenhofer, Mirjam (Google Zurich) | Wattenhofer, Roger (ETH Zurich) | Zhu, Zack (ETH Zurich)
Today, YouTube is the largest user-driven video content provider in the world; it has become a major platform for disseminating multimedia information. A major contribution to its success comes from the user-to-user social experience that differentiates it from traditional content broadcasters. This work examines the social network aspect of YouTube by measuring the full-scale YouTube subscription graph, comment graph, and video content corpus. We find YouTube to deviate significantly from network characteristics that mark traditional online social networks, such as homophily, reciprocative linking, and assortativity. However, comparing to reported characteristics of another content-driven online social network, Twitter, YouTube is remarkably similar. Examining the social and content facets of user popularity, we find a stronger correlation between a user's social popularity and his/her most popular content as opposed to typical content popularity. Finally, we demonstrate an application of our measurements for classifying YouTube Partners, who are selected users that share YouTube's advertisement revenue. Results are motivating despite the highly imbalanced nature of the classification problem.
Extracting Diverse Sentiment Expressions with Target-Dependent Polarity from Twitter
Chen, Lu (Wright State University) | Wang, Wenbo (Wright State University) | Nagarajan, Meenakshi (IBM Almaden Research Center) | Wang, Shaojun (Wright State University) | Sheth, Amit P. (Wright State University)
The problem of automatic extraction of sentiment expressions from informal text, as in microblogs such as tweets is a recent area of investigation. Compared to formal text, such as in product reviews or news articles, one of the key challenges lies in the wide diversity and informal nature of sentiment expressions that cannot be trivially enumerated or captured using predefined lexical patterns. In this work, we present an optimization-based approach to automatically extract sentiment expressions for a given target (e.g., movie, or person) from a corpus of unlabeled tweets. Specifically, we make three contributions: (i) we recognize a diverse and richer set of sentiment-bearing expressions in tweets, including formal and slang words/phrases, not limited to pre-specified syntactic patterns; (ii) instead of associating sentiment with an entire tweet, we assess the target-dependent polarity of each sentiment expression. The polarity of sentiment expression is determined by the nature of its target; (iii) we provide a novel formulation of assigning polarity to a sentiment expression as a constrained optimization problem over the tweet corpus. Experiments conducted on two domains, tweets mentioning movie and person entities, show that our approach improves accuracy in comparison with several baseline methods, and that the improvement becomes more prominent with increasing corpus sizes.
Epidemic Intelligence for the Crowd, by the Crowd
Diaz-Aviles, Ernesto (University of Hannover) | Stewart, Avarรฉ (University of Hannover) | Velasco, Edward (Robert Koch Institute) | Denecke, Kerstin (University of Hannover) | Nejdl, Wolfgang (University of Hannover)
Tracking Twitter for public health has shown great potential. However, most recent work has been focused on correlating Twitter messages to influenza rates, a disease that exhibits a marked seasonal pattern. In the presence of sudden outbreaks, how can social media streams be used to strengthen surveillance capacity? In May 2011, Germany reported an outbreak of Enterohemorrhagic Escherichia coli (EHEC). It was one of the largest described outbreaks of EHEC worldwide and the largest in Germany. In this work, we study the crowd's behavior in Twitter during the outbreak. In particular, we report how tracking Twitter helped to detect key user messages that triggered signal detection alarms before MedISys and other well established early warning systems. We also introduce a personalized learning to rank approach that exploits the relationships discovered by: (i) latent semantic topics computed using Latent Dirichlet Allocation (LDA), and (ii) observing the social tagging behavior in Twitter, to rank tweets for epidemic intelligence. Our results provide the grounds for new public health research based on social media.
Trust Propagation with Mixed-Effects Models
Overgoor, Jan (Stanford University) | Wulczyn, Ellery (Stanford University) | Potts, Christopher (Stanford University)
Web-based social networks typically use public trust systems to facilitate interactions between strangers. These systems can be corrupted by misleading information spread under the cover of anonymity, or exhibit a strong bias towards positive feedback, originating from the fear of reciprocity. Trust propagation algorithms seek to overcome these shortcomings by inferring trust ratings between strangers from trust ratings between acquaintances and the structure of the network that connects them. We investigate a trust propagation algorithm that is based on user triads where the trust one user has in another is predicted based on an intermediary user. The propagation function can be applied iteratively to propagate trust along paths between a source user and a target user. We evaluate this approach using the trust network of the CouchSurfing community, which consists of 7.6M trust-valued edges between 1.1M users. We show that our model out-performs one that relies only on the trustworthiness of the target user (a kind of public trust system). In addition, we show that performance is significantly improved by bringing in user-level variability using mixed-effects regression models.