In my previous blog Twitter Sentiment Analysis using Talend, I showed how to extract tweets from Twitter using Talend and then how to do some basic sentiment analysis on those tweets. In this post, I will introduce the Stanford CoreNLP toolkit and show how to integrate it with Talend to perform various NLP (Natural Language Processing) analyses including sentiment analysis. Previously I had managed to perform some basic sentiment analysis on tweets. However, I'd noticed a major flaw with my technique: the method I was using would take each word in a sentence and average the sentiment score of each word. I explain the issue in more detail in my original post, but to give you a flavour of it, I'll show you some examples of correct/incorrect sentiment identification that would result from my previous method: This is incorrect as it should be fairly obvious that this sentence carries negative sentiment.
One of the most difficult challenges reporting and analytics face in public relations measurement is sentiment analysis. Machines attempt textual analysis of sentiment all the time; more often than not, it goes horribly wrong. How does it go wrong? Machines are incapable of understanding context. Machines are typically programmed to look for certain keywords as proxies for sentiment.
Most of the systems on the market will clock anywhere around 55-65% for unseen data, even though they might be 85% accurate in their cross-validations. At this juncture, it's important to realize that sentiment analysis is critical for any system monitoring customer reviews or social media posts. Hardly had the business world caught up with a sentence level sentiment analysis, we are now moving to aspect level sentiment analysis - more directed & granular, adding to the complexity. The question is this - can we do something to augment our sentiment analysis? For the past few months, I have been using context and relationship extraction to augment sentiment analysis.
Sentiment classification provides information about the author's feeling toward a topic through the use of expressive words. However, words indicative of a particular sentiment class can be domain-specific. We train a text classifier for Twitter data related to games using labels inferred from emoticons. Our classifier is able to differentiate between positive and negative sentiment tweets labeled by emoticons with 75.1% accuracy. Additionally, we test the classifier on human-labeled examples with the additional case of neutral or ambiguous sentiment. Finally, we have made the data available to the community for further use and analysis.