Information Extraction
Multilingual Twitter Sentiment Classification: The Role of Human Annotators
Mozetic, Igor, Grcar, Miha, Smailovic, Jasmina
What are the limits of automated Twitter sentiment classification? We analyze a large set of manually labeled tweets in different languages, use them as training data, and construct automated classification models. It turns out that the quality of classification models depends much more on the quality and size of training data than on the type of the model trained. Experimental results indicate that there is no statistically significant difference between the performance of the top classification models. We quantify the quality of training data by applying various annotator agreement measures, and identify the weakest points of different datasets. We show that the model performance approaches the inter-annotator agreement when the size of the training set is sufficiently large. However, it is crucial to regularly monitor the self- and inter-annotator agreements since this improves the training datasets and consequently the model performance. Finally, we show that there is strong evidence that humans perceive the sentiment classes (negative, neutral, and positive) as ordered.
Sentiment analysis with machine learning in R
Machine learning makes sentiment analysis more convenient. This post would introduce how to do sentiment analysis with machine learning using R. In the landscape of R, the sentiment R package and the more general text mining package have been well developed by Timothy P. Jurka. You can check out the sentiment package and the fantastic RTextTools package. Actually, Timothy also writes an maxent package for low-memory multinomial logistic regression (also known as maximum entropy).
Sentiment analysis - A case study on Flipkart and Snapdeal on World Book Day - ParallelDots
With the big data growing bigger and bigger and social media penetrating every facet of the society, construing and monitoring data is one of the biggest challenges faced by the enterprises. Gone are those days when customers have to lodge a formal complaint to register the malfunctioning of any product/services provided by the business enterprise, rather, users these days take it to the social media forum to express their dissatisfaction and anguish towards any improper services/products. Inputs such as tweets, facebook comments could be of significant value to the enterprise to analyze their products/services/ performances, customer behavior and demands. Below is a small case study on Flipkart and Snapdeal performance when'World Book Day' was trending on Twitter. Below is the screenshot of'Flipkart' and'Snapdeal' on the occasion of'World Book Day'.
Implementing Machine Learning Algorithm On Twitter data
Twitter is an extremely popular online social networking and micro-blogging service. Users communicate through "tweets" - these are short 140-character messages or opinions about different topics. This site is a mine of information about users and their interests - their profile, views, attitudes, observations, people they follow on the site, etc. Apart from being used as a channel of communications between family and friends, Twitter is also used for real-time news updates, recommendations and sharing content. Processing all this information will provide marketers and opinion leaders with a wealth of knowledge about consumers and their behavior and enable them to design effective marketing strategies. Join this webinar to learn how to extract, analyse and utilize this data by implementing machine learning algorithm on the available information.
K-NN_and_preprocessing
Data preprocessing is an umbrella term that covers an array of operations data scientists will use to get their data into a form more appropriate for what they want to do with it. For example, before performing sentiment analysis of twitter data, you may want to strip out any html tags, white spaces, expand abbreviations and split the tweets into lists of the words they contain. When analyzing spatial data you may scale it so that it is unit-independent, that is, so that your algorithm doesn't care whether the original measurements were in miles or centimeters. However, preprocessing data does not occur in a vacuum. This is just to say that preprocessing is a means to an end and there are no hard and fast rules: there are standard practices, as we shall see, and you can develop an intuition for what will work but, in the end, preprocessing is generally part of a results-oriented pipeline and its performance needs to be judged in context.
SuperBowl XLIX in Tweets: Sentiment Analysis of 4 Million Tweets
This blog was originally published on our Text Analysis blog, the blog post set out to analyze and visualize 4 million tweets collected during Superbowl XLIX. Not surprisingly, Superbowl XLIX generated a huge amount of chatter on social networks with Twitter Estimating that over 28.4 million posts made with terms relating to the Superbowl. At AYLIEN, we collected just under 4 million Tweets from Hashtags, Handles and Keywords we were monitoring. To keep our sample clean, we removed any reTweets and spam from the Tweets collected and only worked with those Tweets that were written in English. We were left with about 3.5 million Tweets to play with.