Country
Visualizing Topic Models
Chaney, Allison June-Barlow (Princeton University) | Blei, David M. (Princeton University)
Managing large collections of documents is an important problem for many areas of science, industry, and culture. Probabilistic topic modeling offers a promising solution. Topic modeling is an unsupervised machine learning method that learns the underlying themes in a large collection of otherwise unorganized documents. This discovered structure summarizes and organizes the documents. However, topic models are high-level statistical toolsโa user must scrutinize numerical distributions to understand and explore their results. In this paper, we present a method for visualizing topic models. Our method creates a navigator of the documents, allowing users to explore the hidden structure that a topic model discovers. These browsing interfaces reveal meaningful patterns in a collection, helping end-users explore and understand its contents in new ways. We provide open source software of our method.
OMG, I Have to Tweet that! A Study of Factors that Influence Tweet Rates
Kฤฑcฤฑman, Emre (Microsoft Research)
Many studies have shown that social data such as tweets are a rich source of information about the real-world including, for example, insights into health trends. A key limitation when analyzing Twitter data, however, is that it depends on people self-reporting their own behaviors and observations. In this paper, we present a large-scale quantitative analysis of some of the factors that influence self-reporting bias. In our study, we compare a year of tweets about weather events to ground-truth knowledge about actual weather occurrences. For each weather event we calculate how extreme, how expected, and how big a change the event represents. We calculate the extent to which these factors can explain the daily variations in tweet rates about weather events. We find that we can build global models that take into account basic weather information, together with extremeness, expectation and change calculations to account for over 40% of the variability in tweet rates. We build location-specific (i.e., a model per each metropolitan area) models that account for an average of 70% of the variability in tweet rates.
Social Media Is NOT that Bad! The Lexical Quality of Social Media
Rello, Luz (Universitat Pompeu Fabra) | Baeza-Yates, Ricardo (Yahoo! Research)
There is a strong correlation between spelling errors and web text content quality. Using our lexical quality measure,ย based in a small corpus of spelling errors, we present an estimation of the lexical quality of the main Social Media sites. This paper presents an updated and complete analysis of the lexical quality of Social Media written in English and Spanish, including how lexical quality changes in time.
Using Group Membership Markers for Group Identification
Gawron, Jean Mark (San Diego State University) | Gupta, Dipak (San Diego State University) | Stephens, Kellen (San Diego State University) | Tsou, Ming-Hsiang (San Diego State University) | Spitzberg, Brian (San Diego State University) | An, Li (San Diego State University)
We describe a system for automatically ranking documents by degree of militancy, designed as a tool both for finding militant websites and prioritizing the data found. We compare three ranking systems, one employing a small hand-selected vocabulary based on group membership markers used by insiders to identify members and member properties (us) and outsiders and threats (them), one with a much larger vocabulary, and another with a small vocabulary chosen by Mutual Information. We use the same vocabularies to build classifiers. The ranker that achieves the best correlations with human judgments uses the small us-them vocabulary. We confirm and extend recent results in sentiment analysis (paltoglou 2010), showing that a feature-weighting scheme taken from classical IR (TFIDF) produces the best ranking system; we also find, surprisingly, that adjusting these weights with SVM training, while producing a better classifier, produces a worse ranker. Increasing vocabulary size similarly improves classification (while worsening ranking).
A Sentiment-Aware Approach to Community Formation in Social Media
Nguyen, Thin (Deakin University) | Phung, Dinh (Deakin University) | Adams, Brett (Curtin University) | Venkatesh, Svetha (Deakin University)
Participating in a community exemplifies the aspect of sharing, networking and interacting in a social media system. There has been extensive work on characterising on-line communities by their contents and tags using topic modelling tools. However, the role of sentiment and mood has not been studied. Arguably, mood is an integral feature of a text, and becomes more significant in the context of social media: two communities might discuss precisely the same topics, yet within an entirely different atmosphere. Such sentiment-related distinctions are important for many kinds of analysis and applications, such as community recommendation. We present a novel approach to identification of latent hyper-groups in social communities based on usersโ sentiment. The results show that a sentiment-based approach can yield useful insights into community formation and meta-communities, having potential applications in, for example, mental healthโby targeting support or surveillance to communities with negative moodโor in marketingโby targeting customer communities having the same sentiment on similar topics.
Mixed Membership Models for Exploring User Roles in Online Fora
White, Arthur J. (University College Dublin) | Chan, Jeffrey (University of Melbourne) | Hayes, Conor (National University Ireland Galway) | Murphy, Brendan (University College Dublin)
Discussion boards are a form of social media which allow users to discuss topics and exchange information in a complex manner, in a number of different settings. As the popularity of such message boards has increased, communities of users have emerged, and several prominent types of social role have been identified, such as Question Answerer, Celebrity, Discussion Person and Topic Initiator. Recent studies have noted the structural similarity of the egocentric network of users assigned the same role by qualitative criteria. In this paper a methodology is developed with which to cluster together users with similar ego-centric network structures. This is achieved using a mixed membership formulation which allows for the fact that different groups of users may have characteristics in common. The method is then applied to data taken from boards.ie, a medium sized message boards website. Prominent clusters of users are identified and discussed, and illustrative examples of user behaviour provided. The type of interaction, both locally and globally, taking place within forums is examined.
Evolution of Experts in Question Answering Communities
Pal, Aditya (University of Minnesota) | Chang, Shuo (University of Minnesota) | Konstan, Joseph A. (University of Minnesota)
Community Question Answering (CQA) services thrive as a result of a small number of highly active users, typically called experts, who provide a large number of high quality useful answers. Understanding the temporal dynamics and interactions between experts can present key insights into how community members evolve over time. In this paper, we present a temporal study of experts in CQA and analyze the changes in their behavioral patterns over time. Further, using unsupervised machine learning methods, we show the interesting evolution patterns that can help us distinguish experts from one another. Using supervised classification methods, we show that the models based on evolutionary data of users can be more effective at expert identification than the models that ignore evolution. We run our experiments on two large online CQA to show the generality of our proposed approach.
Modeling Spread of Disease from Social Interactions
Sadilek, Adam (University of Rochester) | Kautz, Henry (University of Rochester) | Silenzio, Vincent (University of Rochester)
Research in computational epidemiology to date has concentrated on coarse-grained statistical analysis of populations, often synthetic ones. By contrast, this paper focuses on fine-grained modeling of the spread of infectious diseases throughout a large real-world social network. Specifically, we study the roles that social ties and interactions between specific individuals play in the progress of a contagion. We focus on public Twitter data, where we find that for every health-related message there are more than 1,000 unrelated ones. This class imbalance makes classification particularly challenging. Nonetheless, we present a framework that accurately identifies sick individuals from the content of online communication. Evaluation on a sample of 2.5 million geo-tagged Twitter messages shows that social ties to infected, symptomatic people, as well as the intensity of recent co-location, sharply increase one's likelihood of contracting the illness in the near future. To our knowledge, this work is the first to model the interplay of social activity, human mobility, and the spread of infectious disease in a large real-world population. Furthermore, we provide the first quantifiable estimates of the characteristics of disease transmission on a large scale without active user participation---a step towards our ability to model and predict the emergence of global epidemics from day-to-day interpersonal interactions.
Using Complex Event Processing for Modeling Semantic Requests in Real-Time Social Media Monitoring
Riemer, Dominik (FZI Research Center for Information Technologies) | Stojanovic, Ljiljana (FZI Research Center for Information Technologies) | Stojanovic, Nenad (FZI Research Center for Information Technologies)
Social media analytics has been attracting considerable attention in both research and industry due to the increasing popularity of social media usage. As a subset, social media monitoring describes the process of continuous monitoring of a subject matter in social media. From our point of view, the key requirements for such systems are i) high throughput and real-time processing of incoming data, ii) a user-friendly way to define complex situations of interests that make use of formalized background knowledge and iii) capabilities to perform actions based on gained insights instead of a pure monitoring system. In this paper, we propose a system for (pro) active, real-time social media monitoring. Firstly, we describe the conceptual architecture of our system and necessary pre-processing steps. Secondly, we introduce our concept of semantic requests that is capable to extend event pattern definitions with background knowledge. Finally, we show the usefulness of this system in two different domains: Real-time political opinion tracking and proactive establishment of relationships with consumers in order to perform a new form of real-time marketing. The main advantage of our approach is a simplified, expressive way to formulate event patterns in social media applications.
Facebook and Privacy: The Balancing Act of Personality, Gender, and Relationship Currency
Quercia, Daniele (University of Cambridge) | Casas, Diego Las (Universidade Federal de Minas Gerais) | Pesce, Joao Paulo (Universidade Federal de Minas Gerais) | Stillwell, David (University of Cambridge) | Kosinski, Michal (University of Cambridge) | Almeida, Virgilio (Universidade Federal de Minas Gerais) | Crowcroft, Jon (University of Cambridge)
Social media profiles are telling examples of the everyday need for disclosure and concealment. The balance between concealment and disclosure varies across individuals, and personality traits might partly explain this variability. Experimental findings on the relationship between information disclosure and personality have been so far inconsistent. We thus study this relationship anew with 1,313 Facebook users in the United States using two personality tests: the big five personality test and the self-monitoring test. We model the process of information disclosure in a principled way using Item Response Theory and correlate the resulting user disclosure scores with personality traits. We find a correlation with the trait of Openness and observe gender effects, in that, men and women share equal amount of private information, but men tend to make it more publicly available, well beyond their social circles. Interestingly, geographic (e.g., residence, hometown) and work-related information is used as relationship currency, in that, it is selectively shared with social contacts and is rarely shared with the Facebook community at large.