What are the limits of automated Twitter sentiment classification? We analyze a large set of manually labeled tweets in different languages, use them as training data, and construct automated classification models. It turns out that the quality of classification models depends much more on the quality and size of training data than on the type of the model trained. Experimental results indicate that there is no statistically significant difference between the performance of the top classification models. We quantify the quality of training data by applying various annotator agreement measures, and identify the weakest points of different datasets. We show that the model performance approaches the inter-annotator agreement when the size of the training set is sufficiently large. However, it is crucial to regularly monitor the self- and inter-annotator agreements since this improves the training datasets and consequently the model performance. Finally, we show that there is strong evidence that humans perceive the sentiment classes (negative, neutral, and positive) as ordered.
Sentiment analysis research has predominantly been on English texts. Thus there exist many sentiment resources for English, but less so for other languages. Approaches to improve sentiment analysis in a resource-poor focus language include: (a) translate the focus language text into a resource-rich language such as English, and apply a powerful English sentiment analysis system on the text, and (b) translate resources such as sentiment labeled corpora and sentiment lexicons from English into the focus language, and use them as additional resources in the focus-language sentiment analysis system. In this paper we systematically examine both options. We use Arabic social media posts as stand-in for the focus language text. We show that sentiment analysis of English translations of Arabic texts produces competitive results, w.r.t. Arabic sentiment analysis. We show that Arabic sentiment analysis systems benefit from the use of automatically translated English sentiment lexicons. We also conduct manual annotation studies to examine why the sentiment of a translation is different from the sentiment of the source word or text. This is especially relevant for building better automatic translation systems. In the process, we create a state-of-the-art Arabic sentiment analysis system, a new dialectal Arabic sentiment lexicon, and the first Arabic-English parallel corpus that is independently annotated for sentiment by Arabic and English speakers.
This paper summarizes our research in the area of semantic tagging at the word and sense levels and sets the ground for a new approach to text-level sentiment annotation using a combination of machine learning and linguisticallymotivated techniques. We describe a system for sentiment tagging of words and senses based on WordNet glosses and advance the treatment of sentiment as a fuzzy category.
Sentiment analysis is a highly subjective and challenging task. Its complexity further increases when applied to the Arabic language, mainly because of the large variety of dialects that are unstandardized and widely used in the Web, especially in social media. While many datasets have been released to train sentiment classifiers in Arabic, most of these datasets contain shallow annotation, only marking the sentiment of the text unit, as a word, a sentence or a document. In this paper, we present the Arabic Sentiment Twitter Dataset for the Levantine dialect (ArSenTD-LEV). Based on findings from analyzing tweets from the Levant region, we created a dataset of 4,000 tweets with the following annotations: the overall sentiment of the tweet, the target to which the sentiment was expressed, how the sentiment was expressed, and the topic of the tweet. Results confirm the importance of these annotations at improving the performance of a baseline sentiment classifier. They also confirm the gap of training in a certain domain, and testing in another domain.
Sentiment analysis is the computational study of opinionated text and is becoming increasing important to online commercial applications. However, the majority of current approaches determine sentiment by attempting to detect the overall polarity of a sentence, paragraph, or text window, but without any knowledge about the entities mentioned (e.g. restaurant) and their aspects (e.g. price). Aspect-level sentiment analysis of customer feedback data when done accurately can be leveraged to understand strong and weak performance points of businesses and services, and can also support the formulation of critical action steps to improve performance. In this paper we focus on aspect-level sentiment classification, studying the role of opinion context extraction for a given aspect and the extent to which traditional and neural sentiment classifiers benefit when trained using the opinion context text. We propose four methods to aspect context extraction using lexical, syntactic and sentiment co-occurrence knowledge. Further, we evaluate the usefulness of the opinion contexts for aspect-sentiment analysis. Our experiments on benchmark data sets from SemEval and a real-world dataset from the insurance domain suggests that extracting the right opinion context is effective in improving classification performance.Specifically combining syntactical features with sentiment co-occurrence knowledge leads to the best aspect-sentiment classification performance.