Collaborating Authors

Text Mining and Sentiment Analysis - A Primer


Over years, a crucial part of data-gathering behavior has revolved around what other people think. With the constantly growing popularity and availability of opinion-driven resources such as personal blogs and online review sites, new challenges and opportunities are emerging as people have started using advanced technologies to make decisions now. Sentiment analysis or opinion mining, refers to the use of computational linguistics, text analytics and natural language processing to identify and extract information from source materials. Sentiment analysis is considered one of the most popular applications of text analytics. The primary aspect of sentiment analysis includes data analysis on the body of the text for understanding the opinion expressed by it and other key factors comprising modality and mood.

A Beginner's Guide to Sentiment Analysis with Python


Sentiment analysis is a technique that detects the underlying sentiment in a piece of text. It is the process of classifying text as either positive, negative, or neutral. Machine learning techniques are used to evaluate a piece of text and determine the sentiment behind it. Sentiment analysis is essential for businesses to gauge customer response. Picture this: Your company has just released a new product that is being advertised on a number of different channels.

Bidirectional Encoder Representations from Transformers (BERT): A sentiment analysis odyssey Machine Learning

The purpose of the study is to investigate the relative effectiveness of four different sentiment analysis techniques: (1) unsupervised lexicon-based model using Sent WordNet; (2) traditional supervised machine learning model using logistic regression; (3) supervised deep learning model using Long Short-Term Memory (LSTM); and, (4) advanced supervised deep learning models using Bidirectional Encoder Representations from Transformers (BERT). We use publicly available labeled corpora of 50,000 movie reviews originally posted on internet movie database (IMDB) for analysis using Sent WordNet lexicon, logistic regression, LSTM, and BERT. The first three models were run on CPU based system whereas BERT was run on GPU based system. The sentiment classification performance was evaluated based on accuracy, precision, recall, and F1 score. The study puts forth two key insights: (1) relative efficacy of four highly advanced and widely used sentiment analysis techniques; (2) undisputed superiority of pre-trained advanced supervised deep learning BERT model in sentiment analysis from text data. This study provides professionals in analytics industry and academicians working on text analysis key insight regarding comparative classification performance evaluation of key sentiment analysis techniques, including the recently developed BERT. This is the first research endeavor to compare the advanced pre-trained supervised deep learning model of BERT vis-\`a-vis other sentiment analysis models of LSTM, logistic regression, and Sent WordNet.

Aspect Term Extraction using Graph-based Semi-Supervised Learning Machine Learning

Aspect based Sentiment Analysis is a major subarea of sentiment analysis. Many supervised and unsupervised approaches have been proposed in the past for detecting and analyzing the sentiment of aspect terms. In this paper, a graph-based semi-supervised learning approach for aspect term extraction is proposed. In this approach, every identified token in the review document is classified as aspect or non-aspect term from a small set of labeled tokens using label spreading algorithm. The k-Nearest Neighbor (kNN) for graph sparsification is employed in the proposed approach to make it more time and memory efficient. The proposed work is further extended to determine the polarity of the opinion words associated with the identified aspect terms in review sentence to generate visual aspect-based summary of review documents. The experimental study is conducted on benchmark and crawled datasets of restaurant and laptop domains with varying value of labeled instances. The results depict that the proposed approach could achieve good result in terms of Precision, Recall and Accuracy with limited availability of labeled data.

Pars-ABSA: An Aspect-based Sentiment Analysis Dataset in Persian Machine Learning

Due to the increased availability of online reviews, sentiment analysis had been witnessed a booming interest from the researchers. Sentiment analysis is a computational treatment of sentiment used to extract and understand the opinions of authors. While many systems were built to predict the sentiment of a document or a sentence, many others provide the necessary detail on various aspects of the entity (i.e. aspect-based sentiment analysis). Most of the available data resources were tailored to English and the other popular European languages. Although Persian is a language with more than 110 million speakers, to the best of our knowledge, there is not any public dataset on aspect-based sentiment analysis in Persian. This paper provides a manually annotated Persian dataset, Pars-ABSA, which is verified by 3 native Persian speakers. The dataset consists of 5114 positive, 3061 negative and 1827 neutral data samples from 5602 unique reviews. Moreover, as a baseline, this paper reports the performance of some state-of-the-art aspect-based sentiment analysis methods with a focus on deep learning, on Pars-ABSA. The obtained results are impressive compared to similar English state-of-the-art.