Country
From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series
O' (Carnegie Mellon University) | Connor, Brendan (Carnegie Mellon University) | Balasubramanyan, Ramnath (Carnegie Mellon University) | Routledge, Bryan R. (Carnegie Mellon University) | Smith, Noah A.
We connect measures of public opinion measured from polls with sentiment measured from text. We analyze several surveys on consumer con๏ฌdence and political opinion over the 2008 to 2009 period, and ๏ฌnd they correlate to sentiment word frequencies in contempora- neous Twitter messages. While our results vary across datasets, in several cases the correlations are as high as 80%, and capture important large-scale trends. The re- sults highlight the potential of text streams as a substi- tute and supplement for traditional polling. consumer con๏ฌdence and political opinion, and can also pre- dict future movements in the polls. We ๏ฌnd that temporal smoothing is a critically important issue to support a suc- cessful model.
How Does the Data Sampling Strategy Impact the Discovery of Information Diffusion in Social Media?
Choudhury, Munmun De (Arizona State University) | Lin, Yu-Ru (Arizona State University) | Sundaram, Hari (Arizona State University) | Candan, Kasim Selcuk (Arizona State University) | Xie, Lexing (IBM TJ Watson Research Center) | Kelliher, Aisling (Arizona State University)
Platforms such as Twitter have provided researchers with ample opportunities to analytically study social phenomena. There are however, significant computational challenges due to the enormous rate of production of new information: researchers are therefore, often forced to analyze a judiciously selected โsampleโ of the data. Like other social media phenomena, information diffusion is a social processโit is affected by user context, and topic, in addition to the graph topology. This paper studies the impact of different attribute and topology based sampling strategies on the discovery of an important social media phenomenaโinformation diffusion. We examine several widely-adopted sampling methods that select nodes based on attribute (random, location, and activity) and topology (forest fire) as well as study the impact of attribute based seed selection on topology based sampling. Then we develop a series of metrics for evaluating the quality of the sample, based on user activity (e.g. volume, number of seeds), topological (e.g. reach, spread) and temporal characteristics (e.g. rate). We additionally correlate the diffusion volume metric with two external variablesโsearch and news trends. Our experiments reveal that for small sample sizes (30%), a sample that incorporates both topology and user context (e.g. location, activity) can improve on naive methods by a significant margin of ~15-20%.
Coping With Noise in a Real-World Weblog Crawler and Retrieval System
Lanagan, James (Clarity: Centre For Sensor Web Technologies) | Ferguson, Paul (Clarity: Centre For Sensor Web Technologies) | O' (Clarity: Centre For Sensor Web Technologies) | Hare, Neil (Clarity: Centre For Sensor Web Technologies) | Smeaton, Alan F
In this paper we examine the effects of noise when creating a real-world weblog corpus for information retrieval. We focus on the DiffPost (Lee et al. 2008) approach to noise removal from blog pages, examining the difficulties encountered when crawling the blogosphere during the creation of a real-world corpus of blog pages. We introduce and evaluate a number of enhancements to the original DiffPost approach in order to increase the robustness of the algorithm. We then extend DiffPost by looking at the anchor-text to text ratio, and discover that the time-interval between crawls is more important to the successful application of noise-removal algorithms within the blog context, than any additional improvements to the removal algorithm itself.
ICWSM โ A Great Catchy Name: Semi-Supervised Recognition of Sarcastic Sentences in Online Product Reviews
Tsur, Oren (The Hebrew University) | Davidov, Dmitry (The Hebrew University) | Rappoport, Ari (The Hebrew University)
Sarcasm is a sophisticated form of speech act widely used in online communities. Automatic recognition of sarcasm is, however, a novel task. Sarcasm recognition could contribute to the performance of review summarization and ranking systems. This paper presents SASI, a novel Semi-supervised Algorithm for Sarcasm Identification that recognizes sarcastic sentences in product reviews. SASI has two stages: semi-supervised pattern acquisition, and sarcasm classification. We experimented on a data set of about 66000 Amazon reviews for various books and products. Using a gold standard in which each sentence was tagged by 3 annotators, we obtained precision of 77% and recall of 83.1% for identifying sarcastic sentences. We found some strong features that characterize sarcastic utterances. However, a combination of more subtle pattern-based features proved more promising in identifying the various facets of sarcasm. We also speculate on the motivation for using sarcasm in online communities and social networks.
Trading Strategies to Exploit Blog and News Sentiment
Zhang, Wenbin (Stony Brook University) | Skiena, Steven (Stony Brook University)
We use quantitative media (blogs, and news as a comparison) data generated by a large-scale natural language processing (NLP) text analysis system to perform a comprehensive and comparative study on how company related news variables anticipates or reflects the company's stock trading volumes and financial returns. Building on our findings, we give a sentiment-based market-neutral trading strategy which gives consistently favorable returns with low volatility over a long period. Our results are significant in confirming the performance of general blog and news sentiment analysis methods over broad domains and sources. Moreover, several remarkable differences between news and blogs are also identified.
Whatโs Worthy of Comment? Content and Comment Volume in Political Blogs
Yano, Tae (Carnegie Mellon University) | Smith, Noah A. (Carnegie Mellon University)
In research on blog data, comments are often ignored, What makes a blog post noteworthy? One measure of the and it is easy to see why: comments are very noisy, full popularity or breadth of interest of a blog post is the extent of nonstandard grammar and spelling, usually unedited, often to which readers of the blog are inspired to leave comments cryptic and uninformative, at least to those outside the on the post. In this paper, we study the relationship between blog's community. A few studies have focused on information the text contents of a blog post and the volume of response in comments. Mishe and Glance (2006) showed the it will receive from blog readers. Modeling this relationship value of comments in characterizing the social repercussions has the potential to reveal the interests of a blog's readership of a post, including popularity and controversy. Their largescale community to its authors, readers, advertisers, and scientists user study correlated popularity and comment activity.
Predicting the Speed, Scale, and Range of Information Diffusion in Twitter
Yang, Jiang (University of Michigan) | Counts, Scott (Microsoft Research)
We present results of network analyses of information diffusion on Twitter, via usersโ ongoing social interactions as denoted by โ@usernameโ mentions. Incorporating survival analysis, we constructed a novel model to capture the three major properties of information diffusion: speed, scale, and range. On the whole, we find that some properties of the tweets themselves predict greater information propagation but that properties of the users, the rate with which a user is mentioned historically in particular, are equal or stronger predictors. Implications for end users and system designers are discussed.
Longevity in Second Life
Teng, ChunYuen (University of Michigan, Ann Arbor) | Adamic, Lada (University of Michigan, Ann Arbor)
SL also makes it easy to The past few years have seen a rise in number and popularity meet and interact with new people. of online spaces where individuals can socialize, play, 4. Transaction: Creating content or providing services in SL and learn. All of these spaces face the challenge of retaining can be profitable, with 150M USD in user-to-user transactions the interest of users over time. We study this problem in taking place in the third quarter of 2009 (Linden the context of Second Life (SL).
Why do Users Tag? Detecting Usersโ Motivation for Tagging in Social Tagging Systems
Strohmaier, Markus (Graz University of Technology and Know-Center) | Kรถrner, Christian (Graz University of Technology) | Kern, Roman (Know-Center)
While recent progress has been achieved in understanding the structure and dynamics of social tagging systems, we know little about the underlying user motivations for tagging, and how they influence resulting folksonomies and tags. This paper addresses three issues related to this question: 1.) What motivates users to tag resources, and in what ways is user motivation amenable to quantitative analysis? 2.) Does users' motivation for tagging vary within and across social tagging systems, and if so how? and 3.) How does variability in user motivation influence resulting tags and folksonomies? In this paper, we present measures to detect whether a tagger is primarily motivated by categorizing or describing resources, and apply the measures to datasets from 8 different tagging systems. Our results show that a) users' motivation for tagging varies not only across, but also within tagging systems, and that b) tag agreement among users who are motivated by categorizing resources is significantly lower than among users who are motivated by describing resources. Our findings are relevant for (i) the development of tag recommenders, (ii) the analysis of tag semantics and (iii) the design of search algorithms for social tagging systems.
Modeling Group Dynamics in Virtual Worlds
Shah, Fahad (University of Central Florida) | Sukthankar, Gita Reese (Unversity of Central Florida) | Usher, Chris (University of Hawaii at Hilo)
In this study, we examine human social interactions within virtual worlds and address the question of how group interactions are affected by the game environment. To investigate this problem, we introduced a set of conversational agents into the social environment of Second Life, a massively multi-player online environment that allows users to construct and inhabit their own 3D world. Our agents were created to be sufficiently lifelike to casual observers, so as not to perturb neighboring social interactions. Using our partitioning algorithm, we separated continuous public chat logs from each region into separate conversations which were used to construct a social network of the participants. Unlike many groups formed in communities and workplaces, groups in Second Life can be rapidly-forming (arising from few interactions), persistent (remaining stable over a long period), and are less affected by socio-cultural influences. In this paper, we analyze regional differences in Second Life by measuring characteristics of the network as a whole, determined from the statistics mined from public conversations in the virtual world, rather than focusing on egocentric actors and their attributes.