Automated sentiment analysis and opinion mining is a complex process concerning the extraction of useful subjective information from text. The explosion of user generated content on the Web, especially the fact that millions of users, on a daily basis, express their opinions on products and services to blogs, wikis, social networks, message boards, etc., render the reliable, automated export of sentiments and opinions from unstructured text crucial for several commercial applications. In this paper, we present a novel hybrid vectorization approach for textual resources that combines a weighted variant of the popular Word2Vec representation (based on Term Frequency-Inverse Document Frequency) representation and with a Bag- of-Words representation and a vector of lexicon-based sentiment values. The proposed text representation approach is assessed through the application of several machine learning classification algorithms on a dataset that is used extensively in literature for sentiment detection. The classification accuracy derived through the proposed hybrid vectorization approach is higher than when its individual components are used for text represenation, and comparable with state-of-the-art sentiment detection methodologies.
User-generated reviews are valuable resources for decision making. Identifying the aspect categories discussed in a given review sentence (e.g., “food” and “service” in restaurant reviews) is an important task of sentiment analysis and opinion mining. Given a predefined aspect category set, most previous researches leverage hand-crafted features and a classification algorithm to accomplish the task. The crucial step to achieve better performance is feature engineering which consumes much human effort and may be unstable when the product domain changes. In this paper, we propose a representation learning approach to automatically learn useful features for aspect category detection. Specifically, a semi-supervised word embedding algorithm is first proposed to obtain continuous word representations on a large set of reviews with noisy labels. Afterwards, we propose to generate deeper and hybrid features through neural networks stacked on the word vectors. A logistic regression classifier is finally trained with the hybrid features to predict the aspect category. The experiments are carried out on a benchmark dataset released by SemEval-2014. Our approach achieves the state-of-the-art performance and outperforms the best participating team as well as a few strong baselines.
Sentiment analysis is a gateway to AI-based text analysis. For any company or data scientist looking to extract meaning out of an unstructured text corpus, sentiment analysis is one of the first steps which gives a high RoI of additional insights with relatively low investment of time and effort. With an explosion of text data available in digital formats, the need for sentiment analysis and other NLU techniques for analyzing this data is growing rapidly. Sentiment analysis looks relatively simple and works very well today, but we have reached hereafter significant efforts by researchers who have invented different approaches and tried numerous models. In the chart above, we give a snapshot to the reader about the different approaches tried and their corresponding accuracy on the IMDB dataset.