User generated content is extremely valuable for mining market intelligence because it is unsolicited. We study the problem of analyzing users' sentiment and opinion in their blog, message board, etc. posts with respect to topics expressed as a search query. In the scenario we consider the matches of the search query terms are expanded through coreference and meronymy to produce a set of mentions. The mentions are contextually evaluated for sentiment and their scores are aggregated (using a data structure we introduce call the sentiment propagation graph) to produce an aggregate score for the input entity. An extremely crucial part in the contextual evaluation of individual mentions is finding which sentiment expressions are semantically related to (target) which mentions --- this is the focus of our paper. We present an approach where potential target mentions for a sentiment expression are ranked using supervised machine learning (Support Vector Machines) where the main features are the syntactic configurations (typed dependency paths) connecting the sentiment expression and the mention. We have created a large English corpus of product discussions blogs annotated with semantic types of mentions, coreference, meronymy and sentiment targets. The corpus proves that coreference and meronymy are not marginal phenomena but are really central to determining the overall sentiment for the top-level entity. We evaluate a number of techniques for sentiment targeting and present results which we believe push the current state-of-the-art.
Natural language processing technologies have become quite sophisticated over the past few years. From tech giants to hobbyists, many are rushing to build rich interfaces that can analyze, understand, and respond to natural language. Amazon's Alexa, Microsoft's Cortana, Google's Google Home, and Apple's Siri all aim to change the way we interact with computers. Sentiment analysis, a subfield of natural language processing, consists of techniques that determine the tone of a text or speech. Today, with machine learning and large amounts of data harvested from social media and review sites, we can train models to identify the sentiment of a natural language passage with fair accuracy.
Dinakar, Karthik (Massachusetts Institute of Technology) | Jones, Birago (Massachusetts Institute of Technology) | Lieberman, Henry (Massachusetts Institute of Technology) | Picard, Rosalind (Massachusetts Institute of Technology) | Rose, Carolyn (Carnegie Mellon University) | Thoman, Matthew (Northeastern University) | Reichart, Roi (Massachusetts Institute of Technology)
Adolescent cyber-bullying on social networks is a phenomenon that has received widespread attention. Recent work by sociologists has examined this phenomenon under the larger context of teenage drama and it's manifestations on social networks. Tackling cyber-bullying involves two key components – automatic detection of possible cases, and interaction strategies that encourage reflection and emotional support. Key is showing distressed teenagers that they are not alone in their plight. Conventional topic spotting and document classification into labels like "dating" or "sports" are not enough to effectively match stories for this task. In this work, we examine a corpus of 5500 stories from distressed teenagers from a major youth social network. We combine Latent Dirichlet Allocation and human interpretation of its output using principles from sociolinguistics to extract high-level themes in the stories and use them to match new stories to similar ones. A user evaluation of the story matching shows that theme-based retrieval does a better job of finding relevant and effective stories for this application than conventional approaches.
Although sentiment analysis has attracted a lot of research, little work has been done on social media data compared to product and movie reviews. This is due to the low accuracy that results from the more informal writing seen in social media data. Currently, most of sentiment analysis tools on social media choose the lexicon-based approach instead of the machine learning approach because the latter requires the huge challenge of obtaining enough human-labeled training data for extremely large-scale and diverse social opinion data. The lexicon-based approach requires a sentiment dictionary to determine opinion polarity. This dictionary can also provide useful features for any supervised learning method of the machine learning approach. However, many benchmark sentiment dictionaries do not cover the many informal and spoken words used in social media. In addition, they are not able to update frequently to include newly generated words online. In this paper, we present an automatic sentiment dictionary generation method, called Constrained Symmetric Nonnegative Matrix Factorization (CSNMF) algorithm, to assign polarity scores to each word in the dictionary, on a large social media corpus — digg.com. Moreover, we will demonstrate our study of Amazon Mechanical Turk (AMT) on social media word polarity, using both the human-labeled dictionaries from AMT and the General Inquirer Lexicon to compare our generated dictionary with. In our experiment, we show that combining links from both WordNet and the corpus to generate sentiment dictionaries does outperform using only one of them, and the words with higher sentiment scores yield better precision. Finally, we conducted a lexicon-based sentiment analysis on human-labeled social comments using our generated sentiment dictionary to show the effectiveness of our method.
This blog was originally featured on blog.aylien.com, a Text Analysis blog with tutorials, Data Visualisations and industry discussions. Our founder, Parsa Ghaffari, gave a talk recently on Natural Language Processing and Sentiment Analysis at the Science Gallery in Dublin. As part of the talk, he put together a nice little example of how you can transform your Google Spreadsheet into a powerful Text Analysis and Data Mining tool. In this case, he took a simple example of analyzing restaurant reviews from a popular review site but the same could be done for hotels, products, service offerings and so on. He wanted to show how easy it can be for data geeks and even the less technical marketers among us, to start analyzing text and gathering business insight from the reams of textual data online today.