Goto

Collaborating Authors

 Country


From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series

AAAI Conferences

We connect measures of public opinion measured from polls with sentiment measured from text. We analyze several surveys on consumer con๏ฌdence and political opinion over the 2008 to 2009 period, and ๏ฌnd they correlate to sentiment word frequencies in contempora- neous Twitter messages. While our results vary across datasets, in several cases the correlations are as high as 80%, and capture important large-scale trends. The re- sults highlight the potential of text streams as a substi- tute and supplement for traditional polling. consumer con๏ฌdence and political opinion, and can also pre- dict future movements in the polls. We ๏ฌnd that temporal smoothing is a critically important issue to support a suc- cessful model.


How Does the Data Sampling Strategy Impact the Discovery of Information Diffusion in Social Media?

AAAI Conferences

Platforms such as Twitter have provided researchers with ample opportunities to analytically study social phenomena. There are however, significant computational challenges due to the enormous rate of production of new information: researchers are therefore, often forced to analyze a judiciously selected โ€œsampleโ€ of the data. Like other social media phenomena, information diffusion is a social processโ€“it is affected by user context, and topic, in addition to the graph topology. This paper studies the impact of different attribute and topology based sampling strategies on the discovery of an important social media phenomenaโ€“information diffusion. We examine several widely-adopted sampling methods that select nodes based on attribute (random, location, and activity) and topology (forest fire) as well as study the impact of attribute based seed selection on topology based sampling. Then we develop a series of metrics for evaluating the quality of the sample, based on user activity (e.g. volume, number of seeds), topological (e.g. reach, spread) and temporal characteristics (e.g. rate). We additionally correlate the diffusion volume metric with two external variablesโ€“search and news trends. Our experiments reveal that for small sample sizes (30%), a sample that incorporates both topology and user context (e.g. location, activity) can improve on naive methods by a significant margin of ~15-20%.


Coping With Noise in a Real-World Weblog Crawler and Retrieval System

AAAI Conferences

In this paper we examine the effects of noise when creating a real-world weblog corpus for information retrieval. We focus on the DiffPost (Lee et al. 2008) approach to noise removal from blog pages, examining the difficulties encountered when crawling the blogosphere during the creation of a real-world corpus of blog pages. We introduce and evaluate a number of enhancements to the original DiffPost approach in order to increase the robustness of the algorithm. We then extend DiffPost by looking at the anchor-text to text ratio, and discover that the time-interval between crawls is more important to the successful application of noise-removal algorithms within the blog context, than any additional improvements to the removal algorithm itself.


ICWSM โ€” A Great Catchy Name: Semi-Supervised Recognition of Sarcastic Sentences in Online Product Reviews

AAAI Conferences

Sarcasm is a sophisticated form of speech act widely used in online communities. Automatic recognition of sarcasm is, however, a novel task. Sarcasm recognition could contribute to the performance of review summarization and ranking systems. This paper presents SASI, a novel Semi-supervised Algorithm for Sarcasm Identification that recognizes sarcastic sentences in product reviews. SASI has two stages: semi-supervised pattern acquisition, and sarcasm classification. We experimented on a data set of about 66000 Amazon reviews for various books and products. Using a gold standard in which each sentence was tagged by 3 annotators, we obtained precision of 77% and recall of 83.1% for identifying sarcastic sentences. We found some strong features that characterize sarcastic utterances. However, a combination of more subtle pattern-based features proved more promising in identifying the various facets of sarcasm. We also speculate on the motivation for using sarcasm in online communities and social networks.


Trading Strategies to Exploit Blog and News Sentiment

AAAI Conferences

We use quantitative media (blogs, and news as a comparison) data generated by a large-scale natural language processing (NLP) text analysis system to perform a comprehensive and comparative study on how company related news variables anticipates or reflects the company's stock trading volumes and financial returns. Building on our findings, we give a sentiment-based market-neutral trading strategy which gives consistently favorable returns with low volatility over a long period. Our results are significant in confirming the performance of general blog and news sentiment analysis methods over broad domains and sources. Moreover, several remarkable differences between news and blogs are also identified.


Whatโ€™s Worthy of Comment? Content and Comment Volume in Political Blogs

AAAI Conferences

In research on blog data, comments are often ignored, What makes a blog post noteworthy? One measure of the and it is easy to see why: comments are very noisy, full popularity or breadth of interest of a blog post is the extent of nonstandard grammar and spelling, usually unedited, often to which readers of the blog are inspired to leave comments cryptic and uninformative, at least to those outside the on the post. In this paper, we study the relationship between blog's community. A few studies have focused on information the text contents of a blog post and the volume of response in comments. Mishe and Glance (2006) showed the it will receive from blog readers. Modeling this relationship value of comments in characterizing the social repercussions has the potential to reveal the interests of a blog's readership of a post, including popularity and controversy. Their largescale community to its authors, readers, advertisers, and scientists user study correlated popularity and comment activity.


Predicting the Speed, Scale, and Range of Information Diffusion in Twitter

AAAI Conferences

We present results of network analyses of information diffusion on Twitter, via usersโ€™ ongoing social interactions as denoted by โ€œ@usernameโ€ mentions. Incorporating survival analysis, we constructed a novel model to capture the three major properties of information diffusion: speed, scale, and range. On the whole, we find that some properties of the tweets themselves predict greater information propagation but that properties of the users, the rate with which a user is mentioned historically in particular, are equal or stronger predictors. Implications for end users and system designers are discussed.


Longevity in Second Life

AAAI Conferences

SL also makes it easy to The past few years have seen a rise in number and popularity meet and interact with new people. of online spaces where individuals can socialize, play, 4. Transaction: Creating content or providing services in SL and learn. All of these spaces face the challenge of retaining can be profitable, with 150M USD in user-to-user transactions the interest of users over time. We study this problem in taking place in the third quarter of 2009 (Linden the context of Second Life (SL).


Why do Users Tag? Detecting Usersโ€™ Motivation for Tagging in Social Tagging Systems

AAAI Conferences

While recent progress has been achieved in understanding the structure and dynamics of social tagging systems, we know little about the underlying user motivations for tagging, and how they influence resulting folksonomies and tags. This paper addresses three issues related to this question: 1.) What motivates users to tag resources, and in what ways is user motivation amenable to quantitative analysis? 2.) Does users' motivation for tagging vary within and across social tagging systems, and if so how? and 3.) How does variability in user motivation influence resulting tags and folksonomies? In this paper, we present measures to detect whether a tagger is primarily motivated by categorizing or describing resources, and apply the measures to datasets from 8 different tagging systems. Our results show that a) users' motivation for tagging varies not only across, but also within tagging systems, and that b) tag agreement among users who are motivated by categorizing resources is significantly lower than among users who are motivated by describing resources. Our findings are relevant for (i) the development of tag recommenders, (ii) the analysis of tag semantics and (iii) the design of search algorithms for social tagging systems.


Modeling Group Dynamics in Virtual Worlds

AAAI Conferences

In this study, we examine human social interactions within virtual worlds and address the question of how group interactions are affected by the game environment. To investigate this problem, we introduced a set of conversational agents into the social environment of Second Life, a massively multi-player online environment that allows users to construct and inhabit their own 3D world. Our agents were created to be sufficiently lifelike to casual observers, so as not to perturb neighboring social interactions. Using our partitioning algorithm, we separated continuous public chat logs from each region into separate conversations which were used to construct a social network of the participants. Unlike many groups formed in communities and workplaces, groups in Second Life can be rapidly-forming (arising from few interactions), persistent (remaining stable over a long period), and are less affected by socio-cultural influences. In this paper, we analyze regional differences in Second Life by measuring characteristics of the network as a whole, determined from the statistics mined from public conversations in the virtual world, rather than focusing on egocentric actors and their attributes.