Goto

Collaborating Authors

 New Mexico Institute of Mining and Technology


Emerging Topic Detection for Business Intelligence Via Predictive Analysis of 'Meme' Dynamics

AAAI Conferences

Detecting and characterizing emerging topics of discussion and consumer trends through analysis of Internet data is of great interest to businesses. This paper considers the problem of monitoring the Web to spot emerging memes – distinctive phrases which act as “tracers” for topics – as a means of early detection of new topics and trends. We present a novel methodology for predicting which memes will propagate widely, appearing in hundreds or thousands of blog posts, and which will not, thereby enabling discovery of significant topics. We begin by identifying measurables which should be predictive of meme success. Interestingly, these metrics are not those traditionally used for such prediction but instead are subtle measures of meme dynamics. These metrics form the basis for learning a classifier which predicts, for a given meme, whether or not it will propagate widely. The utility of the prediction methodology is demonstrated through analysis of a sample of 200 memes which emerged online during the second half of 2008.


Estimating Sentiment Orientation in Social Media for Business Informatics

AAAI Conferences

Inferring the sentiment of social media content, for instance blog postings or online product reviews, is both of great interest to businesses and technically challenging to accomplish. This paper presents two computational methods for estimating social media sentiment which address the challenges associated with Web-based analysis. Each method formulates the task as one of text classification, models the data as a bipartite graph of documents and words, and assumes that only limited prior information is available regarding the sentiment orientation of any of the documents or words of interest. The first algorithm is a semi-supervised sentiment classifier which combines knowledge of the sentiment labels for a few documents and words with information present in unlabeled data, which is abundant online. The second algorithm assumes existence of a set of labeled documents in a domain related to the domain of interest, and leverages these data to estimate sentiment in the target domain. We demonstrate the utility of the proposed methods by showing they outperform several standard methods for the task of inferring the sentiment of online reviews of movies, electronics products, and kitchen appliances. Additionally, we illustrate the potential of the methods for multilingual business informatics through a case study involving estimation of Indonesian public opinion regarding the July 2009 Jakarta hotel bombings.