In my previous blog Twitter Sentiment Analysis using Talend, I showed how to extract tweets from Twitter using Talend and then how to do some basic sentiment analysis on those tweets. In this post, I will introduce the Stanford CoreNLP toolkit and show how to integrate it with Talend to perform various NLP (Natural Language Processing) analyses including sentiment analysis. Previously I had managed to perform some basic sentiment analysis on tweets. However, I'd noticed a major flaw with my technique: the method I was using would take each word in a sentence and average the sentiment score of each word. I explain the issue in more detail in my original post, but to give you a flavour of it, I'll show you some examples of correct/incorrect sentiment identification that would result from my previous method: This is incorrect as it should be fairly obvious that this sentence carries negative sentiment.
If you googled'How to use Stanford CoreNLP in Python?' and landed on this post then you already know what it is. For those who don't know, Stanford CoreNLP is an open source software developed by Stanford that provides various Natural Language Processing tools such as: Stemming, Lemmatization, Part-Of-Speech Tagging, Dependency Parsing, Sentiment Analysis, and Entity Extraction. Stanford CoreNLP is written in Java. If your application is in Java you can simply download and import all the needed jars or setup it with maven. However, I find Python to be more flexible in terms of processing text than Java.
Sentiment analysis is extremely useful us to gain an overview of the public opinion behind certain topics and feedbacks. Automatically classifying text by sentiment allows you to easily find out the general opinions of people in your area of interest. For example, you might want to analyze reviews of a product to help you improve the customer experience, or to find the most or least popular product. The Obama used sentiment analysis to gauge public opinion to policy announcements and campaign messages ahead of 2012 presidential election. How can we get this?
In my previous blog, I showed you how to integrate Stanford CoreNLP with Talend using a simple example. In this post I'll show you how to modify that code in order to make the most of Talend's strengths as a data integration tool. Below is a Talend job I have built to read some tweets from a database (see this blog article for information on how to retrieve tweets with Talend), run the text through the CoreNLP sentiment analysis code, and then write tweets back to the database with the addition of the sentiment. In this particular example, the text to be analysed are tweets coming from a database. However, the same job will work with any string input.