Industry
Using Group Membership Markers for Group Identification
Gawron, Jean Mark (San Diego State University) | Gupta, Dipak (San Diego State University) | Stephens, Kellen (San Diego State University) | Tsou, Ming-Hsiang (San Diego State University) | Spitzberg, Brian (San Diego State University) | An, Li (San Diego State University)
We describe a system for automatically ranking documents by degree of militancy, designed as a tool both for finding militant websites and prioritizing the data found. We compare three ranking systems, one employing a small hand-selected vocabulary based on group membership markers used by insiders to identify members and member properties (us) and outsiders and threats (them), one with a much larger vocabulary, and another with a small vocabulary chosen by Mutual Information. We use the same vocabularies to build classifiers. The ranker that achieves the best correlations with human judgments uses the small us-them vocabulary. We confirm and extend recent results in sentiment analysis (paltoglou 2010), showing that a feature-weighting scheme taken from classical IR (TFIDF) produces the best ranking system; we also find, surprisingly, that adjusting these weights with SVM training, while producing a better classifier, produces a worse ranker. Increasing vocabulary size similarly improves classification (while worsening ranking).
Epidemic Intelligence for the Crowd, by the Crowd
Diaz-Aviles, Ernesto (University of Hannover) | Stewart, Avaré (University of Hannover) | Velasco, Edward (Robert Koch Institute) | Denecke, Kerstin (University of Hannover) | Nejdl, Wolfgang (University of Hannover)
Tracking Twitter for public health has shown great potential. However, most recent work has been focused on correlating Twitter messages to influenza rates, a disease that exhibits a marked seasonal pattern. In the presence of sudden outbreaks, how can social media streams be used to strengthen surveillance capacity? In May 2011, Germany reported an outbreak of Enterohemorrhagic Escherichia coli (EHEC). It was one of the largest described outbreaks of EHEC worldwide and the largest in Germany. In this work, we study the crowd's behavior in Twitter during the outbreak. In particular, we report how tracking Twitter helped to detect key user messages that triggered signal detection alarms before MedISys and other well established early warning systems. We also introduce a personalized learning to rank approach that exploits the relationships discovered by: (i) latent semantic topics computed using Latent Dirichlet Allocation (LDA), and (ii) observing the social tagging behavior in Twitter, to rank tweets for epidemic intelligence. Our results provide the grounds for new public health research based on social media.
Happy, Nervous or Surprised? Classification of Human Affective States in Social Media
Choudhury, Munmun De (Microsoft Research, Redmond) | Gamon, Michael (Microsoft Research, Redmond) | Counts, Scott (Microsoft Research, Redmond)
Sentiment classification has been a well-investigated research area in the computational linguistics community. However, most of the research is primarily focused on detecting simply the polarity in text, often needing extensive manual labeling of ground truth. Additionally, little attention has been directed towards a finer analysis of human moods and affective states. Motivated by research in psychology, we propose and develop a classifier of several human affective states in social media. Starting with about 200 moods, we utilize mechanical turk studies to derive naturalistic signals from posts shared on Twitter about a variety of affects of individuals. This dataset is then deployed in an affect classification task with promising results. Our findings indicate that different types of affect involve different emotional content and usage styles; hence the performance of the classifier on various affects can differ considerably.
Identifying Microblogs for Targeted Contextual Advertising
Dave, Kushal Shailesh (International Institute of Information Technology, Hyderabad) | Varma, Vasudeva (International Institute of Information Technology, Hyderabad)
Micro-blogging sites such as Facebook, Twitter, Google+ present a nice opportunity for targeting advertisements that are contextually related to the microblog content. By virtue of the sparse and noisy text makes identifying the microblogs suitable for advertising a very hard problem. In this work, we approach the problem of identifying the microblogs that could be targeted for advertisements as a two-step classification approach. In the first pass, microblogs suitable for advertising are identified. Next, in the second pass, we build a model to find the sentiment of the advertisable microblog. The systems use features derived from the Part-of-speech tags, the tweet content and uses external resources such as query logs and n-gram dictionaries from previously labeled data.This work aims at providing a thorough insight into the problem and analyzing various features to assess which features contribute the most towards identifying the tweets that can be targeted for advertisements.
Where Online Friends Meet: Social Communities in Location-Based Networks
Brown, Chloë (University of Cambridge) | Nicosia, Vincenzo (University of Cambridge) | Scellato, Salvatore (University of Cambridge) | Noulas, Anastasios (University of Cambridge) | Mascolo, Cecilia (University of Cambridge)
Recent research suggests that, as in offline scenarios, spatial proximity increases the likelihood that two individuals establish an online social connection, and geographic closeness could therefore influence the formation of online communities. In this work we present a study of communities in two online social networks with location-sharing features and analyze their social and spatial properties. We study the places users visit to understand whether communities revolve around places or whether they exist independently. Our results suggest that community structure in social networks may arise from both social and spatial factors, so that exploiting information about the places where people go could benefit the definition of new community detection methods and community evolution models.
What's in Your Tweets? I Know Who You Supported in the UK 2010 General Election
Boutet, Antoine (INRIA Rennes Bretagne Atlantique) | Kim, Hyoungshick (University of Cambridge) | Yoneki, Eiko (University of Cambridge)
Nowadays, the use of social media such as Twitter is necessary to monitor trends of people on political issues. As a case study, we collected the main stream of Twitter related to the 2010 UK general election during the associated period. We analyse the characteristics of the three main parties in the election. Also, we propose a simple and practical algorithm to identify the political leaning of users using the amount of Twitter messages which seem related to political parties. The experimental results showed that the best-performing classification method -- which uses the number of Twitter messages referring to a particular political party -- achieved about 86% classification accuracy without any training phase.
An Evaluation of the Role of Sentiment in Second Screen Microblog Search Tasks
Bermingham, Adam (Dublin City University) | Smeaton, Alan F (Dublin City University)
The recent prominence of the real-time web is proving both challenging and disruptive for information retrieval and web data mining research. User-generated content on the real-time web is perhaps best epitomised by content on microblogging platforms, such as Twitter. Given the substantial quantity of microblog posts that may be relevant to a user's query at a point in time, automated methods are required to sift through this information. Sentiment analysis offers a promising direction for modelling microblog content. We build and evaluate a sentiment-based filtering system using real-time user studies. We find a significant role played by sentiment in the search scenarios, observing detrimental effects in filtering out certain sentiment types. We make a series of observations regarding associations between document-level sentiment and user feedback, including associations with user profile attributes, and users' prior topic sentiment.
More or Less: Amount of Personal Information Displayed in Social Network Site Profiles and Its Impact on Viewers’ Intentions to Socialize with the Profile Owner
Baruh, Lemi (Koc University) | Chisik, Yoram (University of Madeira) | Bisson, Christophe (Kadir Has University) | Senova, Basak (NOMAD)
This paper presents the results of an experiment that employed a 2 (low vs. high information) by 2 (male vs. female profile) design to investigate the relationship between amount of information displayed in a Social Network Site (SNS) profile and profile viewers’ intentions to engage in further social interactions (communicate online, add to SNS profile, and meet face-to-face) with the profile owner. The results indicate that more information increases the likelihood of relationship initiation for male profiles but decreases it for female profiles. Also, viewers are inclined to initiate an interaction when less information is presented in an SNS profile of a person from the opposite sex; but require more information from their own sex.
Homophily and Latent Attribute Inference: Inferring Latent Attributes of Twitter Users from Neighbors
Zamal, Faiyaz Al (McGill University) | Liu, Wendy (McGill University) | Ruths, Derek (McGill University)
In this paper, we extend existing work on latent attribute inference by leveraging the principle of homophily: we evaluate the inference accuracy gained by augmenting the user features with features derived from the Twitter profiles and postings of her friends. We consider three attributes which have varying degrees of assortativity: gender, age, and political affiliation. Our approach yields a significant and robust increase in accuracy for both age and political affiliation, indicating that our approach boosts performance for attributes with moderate to high assortativity. Furthermore, different neighborhood subsets yielded optimal performance for different attributes, suggesting that different subsamples of the user's neighborhood characterize different aspects of the user herself. Finally, inferences using only the features of a user's neighbors outperformed those based on the user's features alone. This suggests that the neighborhood context alone carries substantial information about the user.
Catching the Long-Tail: Extracting Local News Events from Twitter
Agarwal, Puneet (TCS Innovation Labs, Delhi) | Vaithiyanathan, Rajgopal (TCS Innovation Labs, Delhi) | Sharma, Saurabh (TCS Innovation Labs, Delhi) | Shroff, Gautam (TCS Innovation Labs, Delhi)
Twitter, used in 200 countries with over 250 milliontweets a day, is a rich source of local news from aroundthe world. Many events of local importance are first reportedon Twitter, including many that never reach newschannels. Further, there are often only a few tweetsreporting each such event, in contrast with the largervolumes that follow events of wider significance. Eventhough such events may be primarily of local importance,they can also be of critical interest to some specificbut possibly far flung entities: For example, a firein a supplier’s factory half-way around the world maybe of interest even from afar. In this paper we describehow this ‘long tail’ of events can be detected in spite oftheir sparsity.We then extract and correlate informationfrom multiple tweets describing the same event. Ourgeneric architecture for converting a tweet-stream intoevent-objects uses locality sensitive hashing, classification,boosting, information extraction and clustering.Our results, based on millions of tweets monitored overmany months, appear to validate our approach and architecture:We achieved success-rates in the 80% rangefor event detection and 76% on event-correlation; we also reduced tweet-comparisons by 80% using LSH.