In my previous blog Twitter Sentiment Analysis using Talend, I showed how to extract tweets from Twitter using Talend and then how to do some basic sentiment analysis on those tweets. In this post, I will introduce the Stanford CoreNLP toolkit and show how to integrate it with Talend to perform various NLP (Natural Language Processing) analyses including sentiment analysis. Previously I had managed to perform some basic sentiment analysis on tweets. However, I'd noticed a major flaw with my technique: the method I was using would take each word in a sentence and average the sentiment score of each word. I explain the issue in more detail in my original post, but to give you a flavour of it, I'll show you some examples of correct/incorrect sentiment identification that would result from my previous method: This is incorrect as it should be fairly obvious that this sentence carries negative sentiment.
If you googled'How to use Stanford CoreNLP in Python?' and landed on this post then you already know what it is. For those who don't know, Stanford CoreNLP is an open source software developed by Stanford that provides various Natural Language Processing tools such as: Stemming, Lemmatization, Part-Of-Speech Tagging, Dependency Parsing, Sentiment Analysis, and Entity Extraction. Stanford CoreNLP is written in Java. If your application is in Java you can simply download and import all the needed jars or setup it with maven. However, I find Python to be more flexible in terms of processing text than Java.
This paper describes the Amobee sentiment analysis system, adapted to compete in SemEval 2017 task 4. The system consists of two parts: a supervised training of RNN models based on a Twitter sentiment treebank, and the use of feedforward NN, Naive Bayes and logistic regression classifiers to produce predictions for the different sub-tasks. The algorithm reached the 3rd place on the 5-label classification task (sub-task C).