Discourse & Dialogue
iFeel 2.0: A Multilingual Benchmarking System for Sentence-Level Sentiment Analysis
Araujo, Matheus Lima Diniz (Federal University of Minas Gerais) | Diniz, João Paulo (Federal University of Minas Gerais) | Bastos, Lucas (Federal University of Minas Gerais) | Soares, Elias (Federal University of Minas Gerais) | Junior, Manoel (Federal University of Minas Gerais) | Ferreira, Miller (Federal University of Minas Gerais) | Ribeiro, Filipe (Federal University of Ouro Preto) | Benevenuto, Fabrício (Federal University of Minas Gerais)
Sentiment analysis became a hot topic, specially with the amount of opinions available in social media data. With the increasing interest in this theme, several methods have been proposed in the literature. Recent efforts have showed that there is no single method that always achieves the best prediction performance for different datasets. Additionally, novel methods have not being extensively compared with other methods and across different datasets, specially methods that are not designed to the English language. Consequently, researchers tend to accept any popular method as a valid methodology to measure sentiments, a practice that is usual in science. In this context, we propose iFeel 2.0, an online web system that implements 19 sentence-level sentiment analysis methods and allows users to easily label a dataset with all of them. iFeel aims at easing the comparison of new methods with baseline approaches and can also be helpful for those interested in using sentiment analysis, allowing them to choose an appropriate sentiment analysis method that works fine for a new dataset. We also incorporate a multiple language feature to allow methods designed for specific languages to be easily compared with a baseline approach that simply translates the input data to English and run these 19 methods. We hope this system can represent an important contribution to this field. Sentiment analysis became a hot topic, specially with the amount of opinions available in social media data.With the increasing interest in this theme, several methods have been proposed in the literature. Recent effortshave showed that there is no single method that always achieves the best prediction performance for different datasets. Additionally, novel methods have not being extensively compared with other methods and across different datasets, specially methods that are not designed to the English language.Consequently, researchers tend to accept any popular method as a valid methodology to measure sentiments, a practice that is usual in science.In this context, we propose iFeel 2.0, an online web system that implements 19 sentence-level sentiment analysis methods and allows users to easily label a dataset with all of them. iFeel aims at easing the comparison of new methods with baseline approaches and can also be helpful for those interested in using sentiment analysis, allowing them to choose an appropriate sentiment analysis method that works fine for a new dataset.We also incorporate a multiple language feature to allow methods designed for specific languages to be easily compared with a baseline approach that simply translates the input data to English and run these 19 methods. We hope this system can represent an important contribution to this field.
Comparing Overall and Targeted Sentiments in Social Media during Crises
Vargas, Saul (University of Glasgow) | McCreadie, Richard (University of Glasgow) | Macdonald, Craig (University of Glasgow) | Ounis, Iadh (University of Glasgow)
The tracking of citizens' reactions in social media during crises has attracted an increasing level of interest in the research community. In particular, sentiment analysis over social media posts can be regarded as a particularly useful tool, enabling civil protection and law enforcement agencies to more effectively respond during this type of situation. Prior work on sentiment analysis in social media during crises has applied well-known techniques for overall sentiment detection in posts. However, we argue that sentiment analysis of the overall post might not always be suitable, as it may miss the presence of more targeted sentiments, e.g. about the people and organizations involved (which we refer to as sentiment targets). Through a crowdsourcing study, we show that there are marked differences between the overall tweet sentiment and the sentiment expressed towards the subjects mentioned in tweets related to three crises events.
Fusing Audio, Textual, and Visual Features for Sentiment Analysis of News Videos
Pereira, Moisés Henrique Ramos (University Center of Belo Horizonte (UNI-BH).) | Pádua, Flávio Luis Cardeal (Federal Center for Technological Education of Minas Gerais (CEFET-MG)) | Pereira, Adriano César Machado (Federal University of Minas Gerais (UFMG)) | Benevenuto, Fabrício (Federal University of Minas Gerais (UFMG)) | Dalip, Daniel Hasan (University Center of Belo Horizonte (UNI-BH))
This paper presents a novel approach to perform sentiment analysis of news videos, based on the fusion of audio, textual and visual clues extracted from their contents. The proposed approach aims at contributing to the semiodiscoursive study regarding the construction of the ethos (identity) of this media universe, which has become a central part of the modern-day lives of millions of people. To achieve this goal, we apply state-of-the-art computational methods for (1) automatic emotion recognition from facial expressions, (2) extraction of modulations in the participants' speeches and (3) sentiment analysis from the closed caption associated to the videos of interest. More specifically, we compute features, such as, visual intensities of recognized emotions, field sizes of participants, voicing probability, sound loudness, speech fundamental frequencies and the sentiment scores (polarities) from text sentences in the closed caption. Experimental results with a dataset containing 520 annotated news videos from three Brazilian and one American popular TV newscasts show that our approach achieves an accuracy of up to 84% in the sentiments (tension levels) classification task, thus demonstrating its high potential to be used by media analysts in several applications, especially, in the journalistic domain.
Topic Modeling in Twitter: Aggregating Tweets by Conversations
Alvarez-Melis, David (Massachusetts Institute of Technology) | Saveski, Martin (Massachusetts Institute of Technology)
We propose a new pooling technique for topic modeling in Twitter, which groups together tweets occurring in the same user-to-user conversation. Under this scheme, tweets and their replies are aggregated into a single document and the users who posted them are considered co-authors. To compare this new scheme against existing ones, we train topic models using Latent Dirichlet Allocation (LDA) and the Author-Topic Model (ATM) on datasets consisting of tweets pooled according to the different methods. Using the underlying categories of the tweets in this dataset as a noisy ground truth, we show that this new technique outperforms other pooling methods in terms of clustering quality and document retrieval.
Tweets and Votes: A Four-Country Comparison of Volumetric and Sentiment Analysis Approaches
Ahmed, Saifuddin (University of California, Davis) | Jaidka, Kokil (Adobe Research) | Skoric, Marko M (City University of Hong Kong)
This study analyzes different methodological approaches followed in social media literature and their accuracy in predicting the general elections of four countries. Volumetric and unsupervised and supervised sentiment approaches are adopted for generating 12 metrics to compute predicted voteshares. The findings suggest that Twitter-based predictions can produce accurate results for elections, given the digital environment of a country. A cross-country analyses helps to evaluate the quality of predictions and the influence of different contexts, such as technological development and democratic setups. We recommend future scholars to combine volume, sentiment and network aspects of social media to model voting intentions in developing societies.
Sentiment-Based Topic Suggestion for Micro-Reviews
Lu, Ziyu (The University of Hong Kong) | Mamoulis, Nikos (University of Ioannina) | Pitoura, Evaggelia (University of Ioannina) | Tsaparas, Panayiotis (University of Ioannina)
Location-based social sites, such as Foursquare or Yelp, are gaining increasing popularity. These sites allow users to check in at venues and leave a short commentary in the form of a micro-review. Micro-reviews are rich in content as they offer a distilled and concise account of user experience. In this paper we consider the problem of predicting the topic of a micro-review by a user who visits a new venue. Such a prediction can help users make informed decisions, and also help venue owners personalize users’ experiences. However, topic modeling for micro-reviews is particularly difficult, due to their short and fragmented nature. We address this issue using pooling strategies, which aggregate micro-reviews at the venue or user level, and we propose novel probabilistic models based on Latent Dirichlet Allocation (LDA) for extracting the topics related to a user-venue pair. Our best topic model integrates influences from both venue inherent properties and user preferences, considering at the same the sentiment orientation of the users. Experimental results on real datasets demonstrate the superiority of this model compared to simpler models and previous work; they also show that venue-inherent properties have higher influences on the topics of micro-reviews.
TweetGrep: Weakly Supervised Joint Retrieval and Sentiment Analysis of Topical Tweets
Guha, Satarupa (International Institute of Information Technology, Hyderabad) | Chakraborty, Tanmoy (University of Maryland, College Park) | Datta, Samik (Flipkart Internet Pvt. Ltd.) | Kumar, Mohit (Flipkart Internet Pvt. Ltd.) | Varma, Vasudeva (International Institute of Information Technology, Hyderabad)
An overwhelming amount of data is generated everyday onsocial media, encompassing a wide spectrum of topics. With almost every business decision depending on customer opinion, mining of social media data needs to be quick and easy.For a data analyst to keep up with the agility and the scale of the data, it is impossible to bank on fully supervised techniques to mine topics and their associated sentiments from social media. Motivated by this, we propose a weakly supervised approach (named, TweetGrep) that lets the data analyst easily define a topic by few keywords and adapt a generic sentiment classifier to the topic – by jointly modeling topics and sentiments using label regularization. Experiments with diverse datasets show that TweetGrep beats the state-of-the-art models for both the tasks of retrieving topical tweet sand analyzing the sentiment of the tweets (average improvement of 4.97% and 6.91% respectively in terms of area under the curve). Further, we show that TweetGrep can also be adopted in a novel task of hashtag disambiguation, which significantly outperforms the baseline methods.
Validity
A lot of discussion around Matt Jockers' Syuzhet package (involving Annie Swafford, Ted Underwood, Andrew Piper, Scott Weingart and many others) has focused on issues of validity -- whether sentiment analysis is accurate enough for the task, whether the Fourier transform is an appropriate method for dimensionality reduction, whether the emotional trajectories themselves are valid measurements of anything at all (Scott has a good enumeration of the various issues here.) Andrew's discussion of the validity of inherently subjective measurements inspired me to solicit at least one data point from readers that we can use for one question under discussion with Syuzhet: what does a human judgment of the "emotional trajectory" of a work look like, and how often do readers agree with each other on this task? This method of soliciting human judgments for inherently subjective tasks is at the core of NLP and a lot of machine learning -- syntactic parsing, part of speech tagging, named entity recognition, topic classification, sentiment analysis, and lots of other tasks all rely on humans making judgments that are often surprisingly difficult in practice; learning algorithms in these cases are not so much learning any notion of "truth" but simply to reproduce the human judgments they're given. Agreement rates between humans is often seen as a proxy for the complexity of the task; if humans can't agree, it can be a sign that the task is ill-defined or underspecified. Word sense disambiguation is one good example of this, with low inter-annotator agreement rates [Snyder and Palmer 2004]; while sentiment analysis was originally designed with product/movie reviews in mind (does person X like product Y?) -- i.e., attitude with respect to a particular target -- I think the more general sentiment-as-tone problem (is this tweet happy or sad?) is much less well specified as a problem with an answer that can be judged by anyone but the original author. One aspect of those kind of annotations that I think is much less explored (which Piper points to and I think would be an extremely interesting area to work on) is the case where multiple judgments are simultaneously valid -- different interpretations of the same phenomenon, each backed by their own argument.
Multilingual Twitter Sentiment Classification: The Role of Human Annotators
Mozetic, Igor, Grcar, Miha, Smailovic, Jasmina
What are the limits of automated Twitter sentiment classification? We analyze a large set of manually labeled tweets in different languages, use them as training data, and construct automated classification models. It turns out that the quality of classification models depends much more on the quality and size of training data than on the type of the model trained. Experimental results indicate that there is no statistically significant difference between the performance of the top classification models. We quantify the quality of training data by applying various annotator agreement measures, and identify the weakest points of different datasets. We show that the model performance approaches the inter-annotator agreement when the size of the training set is sufficiently large. However, it is crucial to regularly monitor the self- and inter-annotator agreements since this improves the training datasets and consequently the model performance. Finally, we show that there is strong evidence that humans perceive the sentiment classes (negative, neutral, and positive) as ordered.