Country
Beyond Trending Topics: Real-World Event Identification on Twitter
Becker, Hila (Columbia University) | Naaman, Mor (Rutgers University) | Gravano, Luis (Columbia University)
User-contributed messages on social media sites such as Twitter have emerged aspowerful, real-time means of information sharing on the Web. These short messages tend to reflect a variety of events in real time, making Twitter particularly well suited as a source of real-time event content. In this paper, we explore approaches for analyzing the stream of Twitter messages to distinguish between messages about real-world events andnon-event messages. Our approach relies on a rich family of aggregatestatistics of topically similar message clusters. Large-scale experiments over millions of Twitter messages show the effectiveness of our approach for surfacing real-world event content on Twitter.
Improving Text Clustering with Social Tagging
Ares, M. Eduardo (University of A Coruña) | Parapar, Javier (University of A Coruña) | Barreiro, Álvaro (University of A Coruña)
Another important question is the absoluteness of the constraints. Lately several web-based tagging systems such as Technorati, Even if we use this approach to turn tags into constraints, Flickr or Delicious have become very popular. In this a fair amount of them are bound to be inaccurate paper we will exploit the information created by the community (i.e., linking documents which should not be in the same in Delicious: a social bookmarking service where cluster) until a high value of the parameter t, due to the polysemy the users can save the URLs of their favourite webpages of the terms used as tags or to differences in the criteria offering also the possibility of associating tags to them. of the taggers. Consequently, we have used soft positive On the other hand the clustering methods are a very important constraints, meaning that the documents affected by one of data mining tool in order to exploit the knowledge them are likely to be in the same cluster, without forcing the present in data collections. In the last years a new family of clustering algorithm to actually put them so.
Prominence Ranking in Graphs with Community Structure
Adali, Sibel (Rensselaer Polytechnic Institute) | Lu, Xiaohui (Rensselaer Polytechnic Institute) | Magdon-Ismail, Malik (Rensselaer Polytechnic Institute) | Purnell, Jonathan (Rensselaer Polytechnic Institute)
We consider prominence ranking in graphs involving actors, their artifacts and the artifact groups. When multiple actors contributing to an artifact constitutes a social tie, associations between the artifacts can be used to infer prominence among actors. This is because prominent actors will tend to collaborate on prominent artifacts, and prominent artifacts will be associated with other prominent artifacts. Our testbed example is the DBLP co-authorship graph: multiple authors (the actors) collaborate to publish research papers (the artifacts); collaboration is the social tie. Papers have prominence themselves (eg. quality and impact of the work) and the prominence of the venues are tied to the prominence of the papers in them. We use our methods to infer prominence based on the venue-based associations of papers, and compare our rankings with external citation based measures of prominence. We compare with numerous other ranking algorithms, and show that the ranking performance gain from using the venues is statistically significant. What if there are no natural artifact groups like venues? We develop a new algorithm which uses discovered artifact groups. Our approach consists of two steps. First, we find artifact groups by linking artifacts with common contributors. Note that instead of finding communities of actors, we consider communities of artifacts. We then use these grouped artifacts in the prominence ranking algorithm. We consider different methods for obtaining the artifact groups, in particular a very efficient embedding based algorithm for graph clustering and show the effectiveness of our method in improving the ranking of actors. The inferred groups are as good as or better than the natural conference venues for DBLP.
Classifying the Political Leaning of News Articles and Users from User Votes
Zhou, Daniel Xiaodan (University of Michigan) | Resnick, Paul (University of Michigan) | Mei, Qiaozhu (University of Michigan)
Social news aggregator services generate readers’ subjective reactions to news opinion articles. Can we use those as a resource to classify articles as liberal or conservative, even without knowing the self-identified political leaning of most users? We applied three semi-supervised learning methods that propagate classifications of political news articles and users as conservative or liberal, based on the assumption that liberal users will vote for liberal articles more often, and similarly for conservative users and articles. Starting from a few labeled articles and users, the algorithms propagate political leaning labels to the entire graph. In cross-validation, the best algorithm achieved 99.6% accuracy on held-out users and 96.3% accuracy on held-out articles. Adding social data such as users’ friendship or text features such as cosine similarity did not improve accuracy. The propagation algorithms, using the subjective liking data from users, also performed better than an SVM based text classifier, which achieved 92.0% accuracy on articles.
Culture Matters: A Survey Study of Social Q&A Behavior
Yang, Jiang (University of Michigan) | Morris, Meredith Ringel (Microsoft Research) | Teevan, Jaime (Microsoft Research) | Adamic, Lada A. (University of Michigan) | Ackerman, Mark S. (University of Michigan)
Online social networking tools are used around the world by people to ask questions of their friends, because friends provide direct, reliable, contextualized, and interactive responses. However, although the tools used in different cultures for question asking are often very similar, the way they are used can be very different, reflecting unique inherent cultural characteristics. We present the results of a survey designed to elicit cultural differences in people’s social question asking behaviors across the United States, the United Kingdom, China, and India. The survey received responses from 933 people distributed across the four countries who held similar job roles and were employed by a single organization. Responses included information about the questions they ask via social networking tools, and their motivations for asking and answering questions online. The results reveal culture as a consistently significant factor in predicting people’s social question and answer behavior. The prominent cultural differences we observe might be traced to people’s inherent cultural characteristics (e.g., their cognitive patterns and social orientation), and should be comprehensively considered in designing social search systems.
Natural Language Processing to the Rescue? Extracting "Situational Awareness" Tweets During Mass Emergency
Verma, Sudha (University of Colorado) | Vieweg, Sarah (University of Colorado) | Corvey, William J. (University of Colorado) | Palen, Leysia (University of Colorado) | Martin, James H. (University of Colorado) | Palmer, Martha (University of Colorado) | Schram, Aaron (University of Colorado) | Anderson, Kenneth M. (University of Colorado)
In times of mass emergency, vast amounts of data are generated via computer-mediated communication (CMC) that are difficult to manually cull and organize into a coherent picture. Yet valuable information is broadcast, and can provide useful insight into time- and safety-critical situations if captured and analyzed properly and rapidly. We describe an approach for automatically identifying messages communicated via Twitter that contribute to situational awareness, and explain why it is beneficial for those seeking information during mass emergencies. We collected Twitter messages from four different crisis events of varying nature and magnitude and built a classifier to automatically detect messages that may contribute to situational awareness, utilizing a combination of hand-annotated and automatically-extracted linguistic features. Our system was able to achieve over 80% accuracy on categorizing tweets that contribute to situational awareness. Additionally, we show that a classifier developed for a specific emergency event performs well on similar events. The results are promising, and have the potential to aid the general public in culling and analyzing information communicated during times of mass emergency.
What Stops Social Epidemics?
Steeg, Greg Ver (University of Southern California) | Ghosh, Rumi (University of Southern California) | Lerman, Kristina (University of Southern California)
Theoretical progress in understanding the dynamics of spreading processes on graphs suggests the existence of an epidemic threshold below which no epidemics form and above which epidemics spread to a significant fraction of the graph. We have observed information cascades on the social media site Digg that spread fast enough for one initial spreader to infect hundreds of people, yet end up affecting only 0.1% of the entire network. We find that two effects, previously studied in isolation, combine cooperatively to drastically limit the final size of cascades on Digg. First, because of the highly clustered structure of the Digg network, most people who are aware of a story have been exposed to it via multiple friends. This structure lowers the epidemic threshold while moderately slowing the overall growth of cascades. In addition, we find that the mechanism for social contagion on Digg points to a fundamental difference between information spread and other contagion processes: despite multiple opportunities for infection within a social group, people are less likely to become spreaders of information with repeated exposure. The consequences of this mechanism become more pronounced for more clustered graphs. Ultimately, this effect severely curtails the size of social epidemics on Digg.
Diversity Measurement of Recommender Systems under Different User Choice Models
Szlávik, Zoltán (VU University Amsterdam) | Kowalczyk, Wojtek (VU University Amsterdam) | Schut, Martijn (VU University Amsterdam)
Recommender systems are increasingly used for personalised navigation through large amounts of information, especially in the e-commerce domain for product purchase advice. Whilst much research effort is spent on developing recommenders further, there is little to no effort spent on analysing the impact of them - neither on the supply (company) nor demand (consumer) side. In this paper, we investigate the diversity impact of a movie recommender. We define diversity for different parts of the domain and measure it in different ways. The novelty of our work is the usage of real rating data (from Netflix) and a recommender system for investigating the (hypothetical) effects of various configurations of the system and users' choice models.We consider a number of different scenarios (which differ in the agent's choice model), run very extensive simulations, analyse various measurements regarding experimental validation and diversity, and report on selected findings. The choice models are an essential part of our work, since these can be influenced by the owner of the recommender once deployed.
Participation Maximization Based on Social Influence in Online Discussion Forums
Sun, Tao (Peking University and Microsoft Research Asia) | Chen, Wei (Microsoft Research Asia) | Liu, Zhenming (Harvard School of Engineering and Applied Sciences and Microsoft Research Asia) | Wang, Yajun (Microsoft Research Asia) | Sun, Xiaorui (Shanghai Jiaotong University and Microsoft Research Asia) | Zhang, Ming (Peking University) | Lin, Chin-Yew (Microsoft Research Asia)
In online discussion forums, users are more motivated to take part in discussions when observing other users’ participation—the effect of social influence among forum users. In this paper, we study how to utilize social influence for increasing the overall forum participation. To this end, we propose a mechanism to maximize user influence and boost participation by displaying forum threads to users. We formally define the participation maximization problem, and show that it is a special instance of the social welfare maximization problem with submodular utility functions and it is NP-hard. However, generic approximation algorithms is impracticable for real-world forums due to time complexity. Thus we design a heuristic algorithm, named Thread Allocation Based on Influence (TABI), to tackle the problem. Through extensive experiments using a dataset from a real-world online forum, we demonstrate that TABI consistently outperforms all other algorithms in maximizing participation. The results of this work demonstrates that current recommender systems can be made more effective by considering future influence propagations. The problem of participation maximization based on influence also opens a new direction in the study of social influence.
Memes Online: Extracted, Subtracted, Injected, and Recollected
Simmons, Matthew P. (University of Michigan) | Adamic, Lada A. (Universiry of Michigan) | Adar, Eytan (University of Michigan)
Social media is playing an increasingly vital role in information dissemination. But with dissemination being more distributed, content often makes multiple hops, and consequently has opportunity to change. In this paper we focus on content that should be changing the least, namely quoted text. We find changes to be frequent, with their likelihood depending on the authority of the copied source and the type of site that is copying. We uncover patterns in the rate of appearance of new variants, their length, and popularity, and develop a simple model that is able to capture them. These patterns are distinct from ones produced when all copies are made from the same source, suggesting that information is evolving as it is being processed collectively in online social media.