Genre
From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series
O' (Carnegie Mellon University) | Connor, Brendan (Carnegie Mellon University) | Balasubramanyan, Ramnath (Carnegie Mellon University) | Routledge, Bryan R. (Carnegie Mellon University) | Smith, Noah A.
We connect measures of public opinion measured from polls with sentiment measured from text. We analyze several surveys on consumer confidence and political opinion over the 2008 to 2009 period, and find they correlate to sentiment word frequencies in contempora- neous Twitter messages. While our results vary across datasets, in several cases the correlations are as high as 80%, and capture important large-scale trends. The re- sults highlight the potential of text streams as a substi- tute and supplement for traditional polling. consumer confidence and political opinion, and can also pre- dict future movements in the polls. We find that temporal smoothing is a critically important issue to support a suc- cessful model.
How Does the Data Sampling Strategy Impact the Discovery of Information Diffusion in Social Media?
Choudhury, Munmun De (Arizona State University) | Lin, Yu-Ru (Arizona State University) | Sundaram, Hari (Arizona State University) | Candan, Kasim Selcuk (Arizona State University) | Xie, Lexing (IBM TJ Watson Research Center) | Kelliher, Aisling (Arizona State University)
Platforms such as Twitter have provided researchers with ample opportunities to analytically study social phenomena. There are however, significant computational challenges due to the enormous rate of production of new information: researchers are therefore, often forced to analyze a judiciously selected “sample” of the data. Like other social media phenomena, information diffusion is a social process–it is affected by user context, and topic, in addition to the graph topology. This paper studies the impact of different attribute and topology based sampling strategies on the discovery of an important social media phenomena–information diffusion. We examine several widely-adopted sampling methods that select nodes based on attribute (random, location, and activity) and topology (forest fire) as well as study the impact of attribute based seed selection on topology based sampling. Then we develop a series of metrics for evaluating the quality of the sample, based on user activity (e.g. volume, number of seeds), topological (e.g. reach, spread) and temporal characteristics (e.g. rate). We additionally correlate the diffusion volume metric with two external variables–search and news trends. Our experiments reveal that for small sample sizes (30%), a sample that incorporates both topology and user context (e.g. location, activity) can improve on naive methods by a significant margin of ~15-20%.
Coping With Noise in a Real-World Weblog Crawler and Retrieval System
Lanagan, James (Clarity: Centre For Sensor Web Technologies) | Ferguson, Paul (Clarity: Centre For Sensor Web Technologies) | O' (Clarity: Centre For Sensor Web Technologies) | Hare, Neil (Clarity: Centre For Sensor Web Technologies) | Smeaton, Alan F
In this paper we examine the effects of noise when creating a real-world weblog corpus for information retrieval. We focus on the DiffPost (Lee et al. 2008) approach to noise removal from blog pages, examining the difficulties encountered when crawling the blogosphere during the creation of a real-world corpus of blog pages. We introduce and evaluate a number of enhancements to the original DiffPost approach in order to increase the robustness of the algorithm. We then extend DiffPost by looking at the anchor-text to text ratio, and discover that the time-interval between crawls is more important to the successful application of noise-removal algorithms within the blog context, than any additional improvements to the removal algorithm itself.
Trading Strategies to Exploit Blog and News Sentiment
Zhang, Wenbin (Stony Brook University) | Skiena, Steven (Stony Brook University)
We use quantitative media (blogs, and news as a comparison) data generated by a large-scale natural language processing (NLP) text analysis system to perform a comprehensive and comparative study on how company related news variables anticipates or reflects the company's stock trading volumes and financial returns. Building on our findings, we give a sentiment-based market-neutral trading strategy which gives consistently favorable returns with low volatility over a long period. Our results are significant in confirming the performance of general blog and news sentiment analysis methods over broad domains and sources. Moreover, several remarkable differences between news and blogs are also identified.
Predicting the Speed, Scale, and Range of Information Diffusion in Twitter
Yang, Jiang (University of Michigan) | Counts, Scott (Microsoft Research)
We present results of network analyses of information diffusion on Twitter, via users’ ongoing social interactions as denoted by “@username” mentions. Incorporating survival analysis, we constructed a novel model to capture the three major properties of information diffusion: speed, scale, and range. On the whole, we find that some properties of the tweets themselves predict greater information propagation but that properties of the users, the rate with which a user is mentioned historically in particular, are equal or stronger predictors. Implications for end users and system designers are discussed.
Longevity in Second Life
Teng, ChunYuen (University of Michigan, Ann Arbor) | Adamic, Lada (University of Michigan, Ann Arbor)
SL also makes it easy to The past few years have seen a rise in number and popularity meet and interact with new people. of online spaces where individuals can socialize, play, 4. Transaction: Creating content or providing services in SL and learn. All of these spaces face the challenge of retaining can be profitable, with 150M USD in user-to-user transactions the interest of users over time. We study this problem in taking place in the third quarter of 2009 (Linden the context of Second Life (SL).
Why do Users Tag? Detecting Users’ Motivation for Tagging in Social Tagging Systems
Strohmaier, Markus (Graz University of Technology and Know-Center) | Körner, Christian (Graz University of Technology) | Kern, Roman (Know-Center)
While recent progress has been achieved in understanding the structure and dynamics of social tagging systems, we know little about the underlying user motivations for tagging, and how they influence resulting folksonomies and tags. This paper addresses three issues related to this question: 1.) What motivates users to tag resources, and in what ways is user motivation amenable to quantitative analysis? 2.) Does users' motivation for tagging vary within and across social tagging systems, and if so how? and 3.) How does variability in user motivation influence resulting tags and folksonomies? In this paper, we present measures to detect whether a tagger is primarily motivated by categorizing or describing resources, and apply the measures to datasets from 8 different tagging systems. Our results show that a) users' motivation for tagging varies not only across, but also within tagging systems, and that b) tag agreement among users who are motivated by categorizing resources is significantly lower than among users who are motivated by describing resources. Our findings are relevant for (i) the development of tag recommenders, (ii) the analysis of tag semantics and (iii) the design of search algorithms for social tagging systems.
Modeling Group Dynamics in Virtual Worlds
Shah, Fahad (University of Central Florida) | Sukthankar, Gita Reese (Unversity of Central Florida) | Usher, Chris (University of Hawaii at Hilo)
In this study, we examine human social interactions within virtual worlds and address the question of how group interactions are affected by the game environment. To investigate this problem, we introduced a set of conversational agents into the social environment of Second Life, a massively multi-player online environment that allows users to construct and inhabit their own 3D world. Our agents were created to be sufficiently lifelike to casual observers, so as not to perturb neighboring social interactions. Using our partitioning algorithm, we separated continuous public chat logs from each region into separate conversations which were used to construct a social network of the participants. Unlike many groups formed in communities and workplaces, groups in Second Life can be rapidly-forming (arising from few interactions), persistent (remaining stable over a long period), and are less affected by socio-cultural influences. In this paper, we analyze regional differences in Second Life by measuring characteristics of the network as a whole, determined from the statistics mined from public conversations in the virtual world, rather than focusing on egocentric actors and their attributes.
Classifier Calibration for Multi-Domain Sentiment Classification
Raaijmakers, Stephan (TNO ICT, Delft, The Netherlands) | Kraaij, Wessel (TNO ICT, Delft, The Netherlands)
Textual sentiment classifiers classify texts into a fixed number of affective classes, such as positive, negative or neutral sentiment, or subjective versus objective information. It has been observed that sentiment classifiers suffer from a lack of generalization capability: a classifier trained on a certain domain generally performs worse on data from another domain. This phenomenon has been attributed to domain-specific affective vocabulary. In this paper, we propose a voting-based thresholding approach, which calibrates a number of existing single-domain classifiers with respect to sentiment data from a new domain. The approach presupposes only a small amount of annotated data from the new domain. We evaluate three criteria for estimating thresholds, and discuss the ramifications of these criteria for the trade-off between classifier performance and manual annotation effort.
A Comparison of Information Seeking Using Search Engines and Social Networks
Morris, Meredith Ringel (Microsoft Research) | Teevan, Jaime (Microsoft Research) | Panovich, Katrina (Massachusetts Institute of Technology)
The Web has become an important information repository; often it is the first source a person turns to with an informa-tion need. One common way to search the Web is with a search engine. However, it is not always easy for people to find what they are looking for with keyword search, and at times the desired information may not be readily available online. An alternative, facilitated by the rise of social media, is to pose a question to one‟s online social network. In this paper, we explore the pros and cons of using a social net-working tool to fill an information need, as compared with a search engine. We describe a study in which 12 participants searched the Web while simultaneously posing a question on the same topic to their social network, and we compare the results they found by each method.