unsupervised sentiment analysis
Unsupervised Sentiment Analysis of Plastic Surgery Social Media Posts
The massive collection of user posts across social media platforms is primarily untapped for artificial intelligence (AI) use cases based on the sheer volume and velocity of textual data. Natural language processing (NLP) is a subfield of AI that leverages bodies of documents, known as corpora, to train computers in human-like language understanding. Using a word ranking method, term frequency-inverse document frequency (TF-IDF), to create features across documents, it is possible to perform unsupervised analytics, machine learning (ML) that can group the documents without a human manually labeling the data. For large datasets with thousands of features, t-distributed stochastic neighbor embedding (t-SNE), k-means clustering and Latent Dirichlet allocation (LDA) are employed to learn top words and generate topics for a Reddit and Twitter combined corpus. Using extremely simple deep learning models, this study demonstrates that the applied results of unsupervised analysis allow a computer to predict either negative, positive, or neutral user sentiment towards plastic surgery based on a tweet or subreddit post with almost 90% accuracy. Furthermore, the model is capable of achieving higher accuracy on the unsupervised sentiment task than on a rudimentary supervised document classification task. Therefore, unsupervised learning may be considered a viable option in labeling social media documents for NLP tasks.
Lex2Sent: A bagging approach to unsupervised sentiment analysis
Lange, Kai-Robin, Rieger, Jonas, Jentsch, Carsten
Unsupervised sentiment analysis is traditionally performed by counting those words in a text that are stored in a sentiment lexicon and then assigning a label depending on the proportion of positive and negative words registered. While these "counting" methods are considered to be beneficial as they rate a text deterministically, their classification rates decrease when the analyzed texts are short or the vocabulary differs from what the lexicon considers default. The model proposed in this paper, called Lex2Sent, is an unsupervised sentiment analysis method to improve the classification of sentiment lexicon methods. For this purpose, a Doc2Vec-model is trained to determine the distances between document embeddings and the embeddings of the positive and negative part of a sentiment lexicon. These distances are then evaluated for multiple executions of Doc2Vec on resampled documents and are averaged to perform the classification task. For three benchmark datasets considered in this paper, the proposed Lex2Sent outperforms every evaluated lexicon, including state-of-the-art lexica like VADER or the Opinion Lexicon in terms of classification rate.
Unsupervised Sentiment Analysis with Signed Social Networks
Cheng, Kewei (Arizona State University) | Li, Jundong (Arizona State University) | Tang, Jiliang (Michigan State University) | Liu, Huan (Arizona State University)
Huge volumes of opinion-rich data is user-generated in social media at an unprecedented rate, easing the analysis of individual and public sentiments. Sentiment analysis has shown to be useful in probing and understanding emotions, expressions and attitudes in the text. However, the distinct characteristics of social media data present challenges to traditional sentiment analysis. First, social media data is often noisy, incomplete and fast-evolved which necessitates the design of a sophisticated learning model. Second, sentiment labels are hard to collect which further exacerbates the problem by not being able to discriminate sentiment polarities. Meanwhile, opportunities are also unequivocally presented. Social media contains rich sources of sentiment signals in textual terms and user interactions, which could be helpful in sentiment analysis. While there are some attempts to leverage implicit sentiment signals in positive user interactions, little attention is paid on signed social networks with both positive and negative links. The availability of signed social networks motivates us to investigate if negative links also contain useful sentiment signals. In this paper, we study a novel problem of unsupervised sentiment analysis with signed social networks. In particular, we incorporate explicit sentiment signals in textual terms and implicit sentiment signals from signed social networks into a coherent model SignedSenti for unsupervised sentiment analysis. Empirical experiments on two real-world datasets corroborate its effectiveness.
Unsupervised Sentiment Analysis for Social Media Images
Wang, Yilin (Arizona State University) | Wang, Suhang (Arizona State University) | Tang, Jiliang (Arizona State University) | Liu, Huan (Arizona State University) | Li, Baoxin (Arizona State University)
Current methods of sentiment analysis for social media images include low-level visual feature based approaches [Jia et Recently text-based sentiment prediction has been al., 2012; Yang et al., 2014], mid-level visual feature based extensively studied, while image-centric sentiment approaches [Borth et al., 2013; Yuan et al., 2013] and deep analysis receives much less attention. In this paper, learning based approaches [You et al., 2015]. The vast majority we study the problem of understanding human of existing methods are supervised, relying on labeled images sentiments from large-scale social media images, to train sentiment classifiers. Unfortunately, sentiment considering both visual content and contextual information, labels are in general unavailable for social media images, and such as comments on the images, captions, it is too labor-and time-intensive to obtain labeled sets large etc. The challenge of this problem lies in enough for robust training. In order to utilize the vast amount the "semantic gap" between low-level visual features of unlabeled social media images, an unsupervised approach and higher-level image sentiments. Moreover, would be much more desirable.