Text Classification and Sentiment Analysis


For a more technical explanation, this and this article can be read. Here you can find a good explanation as well as a list of the mostly used Kernel functions.

Deep Sentiment Analysis using a Graph-based Text Representation

arXiv.org Machine Learning

Accordingly, a prime step in text mining applications is to extract interesting patterns and features, from this supply of unstructured data. Feature extraction can be considered as the core of social media mining tasks such as sentiment analysis, event detection, and news recommendation [2]. In the literature, sentiment analysis tends to be used to refer to the task of classifying the polarity of a given piece of text at the document, sentence, feature, or aspect level [23]. There are various applications on a variety of domains which utilize sentiment analysis, in this regard one can mention applying the sentiment analysis for political reviews to estimate the general viewpoint of the parties [43], predicting stock market prices based on sentiment analysis by utilizing the different financial news data [5], and making use of the sentiment analysis to recognize the current medical and psychological status for a community [23]. Machine learning algorithms and statistical learning techniques have been rising in a variety of scientific fields [9, 10]. A number of machine learning techniques have been proposed to perform the task of sentiment analysis. As one of the powerful sub-domains of machine learning in recent years, deep learning models are emerging as a persuasive computational tool, they have affected many research areas and can be traced in many applications. With respect to the deep learning, textual deep representation models attempt to discover and present intricate syntactic and semantic representations of texts, automatically from data without any handmade feature engineering.

Deep Learning Sentiment Analysis of Amazon.com Reviews and Ratings

arXiv.org Machine Learning

Our study employs sentiment analysis to evaluate the compatibility of Amazon.com reviews with their corresponding ratings. Sentiment analysis is the task of identifying and classifying the sentiment expressed in a piece of text as being positive or negative. On e-commerce websites such as Amazon.com, consumers can submit their reviews along with a specific polarity rating. In some instances, there is a mismatch between the review and the rating. To identify the reviews with mismatched ratings we performed sentiment analysis using deep learning on Amazon.com product review data. Product reviews were converted to vectors using paragraph vector, which then was used to train a recurrent neural network with gated recurrent unit. Our model incorporated both semantic relationship of review text and product information. We also developed a web service application that predicts the rating score for a submitted review using the trained model and if there is a mismatch between predicted rating score and submitted rating score, it provides feedback to the reviewer.

Machine Learning Sentiment Prediction based on Hybrid Document Representation

arXiv.org Machine Learning

Automated sentiment analysis and opinion mining is a complex process concerning the extraction of useful subjective information from text. The explosion of user generated content on the Web, especially the fact that millions of users, on a daily basis, express their opinions on products and services to blogs, wikis, social networks, message boards, etc., render the reliable, automated export of sentiments and opinions from unstructured text crucial for several commercial applications. In this paper, we present a novel hybrid vectorization approach for textual resources that combines a weighted variant of the popular Word2Vec representation (based on Term Frequency-Inverse Document Frequency) representation and with a Bag- of-Words representation and a vector of lexicon-based sentiment values. The proposed text representation approach is assessed through the application of several machine learning classification algorithms on a dataset that is used extensively in literature for sentiment detection. The classification accuracy derived through the proposed hybrid vectorization approach is higher than when its individual components are used for text represenation, and comparable with state-of-the-art sentiment detection methodologies.

Representation Learning for Aspect Category Detection in Online Reviews

AAAI Conferences

User-generated reviews are valuable resources for decision making. Identifying the aspect categories discussed in a given review sentence (e.g., “food” and “service” in restaurant reviews) is an important task of sentiment analysis and opinion mining. Given a predefined aspect category set, most previous researches leverage hand-crafted features and a classification algorithm to accomplish the task. The crucial step to achieve better performance is feature engineering which consumes much human effort and may be unstable when the product domain changes. In this paper, we propose a representation learning approach to automatically learn useful features for aspect category detection. Specifically, a semi-supervised word embedding algorithm is first proposed to obtain continuous word representations on a large set of reviews with noisy labels. Afterwards, we propose to generate deeper and hybrid features through neural networks stacked on the word vectors. A logistic regression classifier is finally trained with the hybrid features to predict the aspect category. The experiments are carried out on a benchmark dataset released by SemEval-2014. Our approach achieves the state-of-the-art performance and outperforms the best participating team as well as a few strong baselines.