Sentiment analysis is a powerful example of how machine learning can help developers build better products with unique features. In short, sentiment analysis is the automated process of understanding if text written in a natural language (English, Spanish, etc.) is positive, neutral, or negative about a given subject. Nowadays, we have many instances where people express opinions and sentiment: tweets, comments, reviews, articles, chats, emails and more. One popular example is Twitter, where real-time opinions from millions of users are expressed constantly. Companies use sentiment analysis on Twitter to discover insights about their products and services.
Over years, a crucial part of data-gathering behavior has revolved around what other people think. With the constantly growing popularity and availability of opinion-driven resources such as personal blogs and online review sites, new challenges and opportunities are emerging as people have started using advanced technologies to make decisions now. Sentiment analysis or opinion mining, refers to the use of computational linguistics, text analytics and natural language processing to identify and extract information from source materials. Sentiment analysis is considered one of the most popular applications of text analytics. The primary aspect of sentiment analysis includes data analysis on the body of the text for understanding the opinion expressed by it and other key factors comprising modality and mood.
Recently we walked you through on how to train a sentiment analysis classifier for hotel reviews using Scrapy and MonkeyLearn. This tutorial is a perfect example on how we can combine web scraped data and machine learning for discovering valuable insights about a particular industry. With this model we were able to analyze millions of reviews and understand if guests love or hate different hotels. But besides understanding the sentiment of a review, wouldn't be interesting to understand what particular aspects do the guests love or hate about a particular hotel? This post will cover how you can create a machine learning classifier to understand the different aspects of hotel reviews.
We are currently in an era of data explosion, where millions of tweets, articles, comments, reviews and the like are being published everyday. Developers are taking advantage of the abundance of data and using things like web scraping to do all kinds of cool things. Sometimes web scraping is not enough; digging deeper and analyzing the data is often needed to unlock the true meaning behind the data and discover valuable insights. On this tutorial we will cover how you can use MonkeyLearn and Scrapy to build a machine learning model that will help you analyze vast amounts of web scraped data in a cost-effective way. We will use Scrapy to extract hotel reviews from TripAdvisor and use those reviews as training samples to create a machine learning model with MonkeyLearn.
This paper covers the two approaches for sentiment analysis: i) lexicon based method; ii) machine learning method. We describe several techniques to implement these approaches and discuss how they can be adopted for sentiment classification of Twitter messages. We present a comparative study of different lexicon combinations and show that enhancing sentiment lexicons with emoticons, abbreviations and social-media slang expressions increases the accuracy of lexicon-based classification for Twitter. We discuss the importance of feature generation and feature selection processes for machine learning sentiment classification. To quantify the performance of the main sentiment analysis methods over Twitter we run these algorithms on a benchmark Twitter dataset from the SemEval-2013 competition, task 2-B. The results show that machine learning method based on SVM and Naive Bayes classifiers outperforms the lexicon method. We present a new ensemble method that uses a lexicon based sentiment score as input feature for the machine learning approach. The combined method proved to produce more precise classifications. We also show that employing a cost-sensitive classifier for highly unbalanced datasets yields an improvement of sentiment classification performance up to 7%.