Information Extraction
Acquiring Commonsense Knowledge for Sentiment Analysis through Human Computation
Boia, Marina (École Polytechnique Fédérale de Lausanne) | Musat, Claudiu Cristian (École Polytechnique Fédérale de Lausanne) | Faltings, Boi (École Polytechnique Fédérale de Lausanne)
Many Artificial Intelligence tasks need large amounts of commonsense knowledge. Because obtaining this knowledge through machine learning would require a huge amount of data, a better alternative is to elicit it from people through human computation. We consider the sentiment classification task, where knowledge about the contexts that impact word polarities is crucial, but hard to acquire from data. We describe a novel task design that allows us to crowdsource this knowledge through Amazon Mechanical Turk with high quality. We show that the commonsense knowledge acquired in this way dramatically improves the performance of established sentiment classification methods.
Prediction of Helpful Reviews Using Emotions Extraction
Martin, Lionel (École Polytechnique Fédérale de Lausanne) | Pu, Pearl (École Polytechnique Fédérale de Lausanne)
Reviews keep playing an increasingly important role in the decision process of buying products and booking hotels. However, the large amount of available information can be confusing to users. A more succinct interface, gathering only the most helpful reviews, can reduce information processing time and save effort. To create such an interface in real time, we need reliable prediction algorithms to classify and predict new reviews which have not been voted but are potentially helpful. So far such helpfulness prediction algorithms have benefited from structural aspects, such as the length and readability score. Since emotional words are at the heart of our written communication and are powerful to trigger listeners' attention, we believe that emotional words can serve as important parameters for predicting helpfulness of review text. Using GALC, a general lexicon of emotional words associated with a model representing 20 different categories, we extracted the emotionality from the review text and applied supervised classification method to derive the emotion-based helpful review prediction. As the second contribution, we propose an evaluation framework comparing three different real-world datasets extracted from the most well-known product review websites. This framework shows that emotion-based methods are outperforming the structure-based approach, by up to 9%.
Adaptive Multi-Compositionality for Recursive Neural Models with Applications to Sentiment Analysis
Dong, Li (Beihang University) | Wei, Furu (Microsoft Research) | Zhou, Ming (Microsoft Research) | Xu, Ke (Beihang University)
Recursive neural models have achieved promising results in many natural language processing tasks. The main difference among these models lies in the composition function, i.e., how to obtain the vector representation for a phrase or sentence using the representations of words it contains. This paper introduces a novel Adaptive Multi-Compositionality (AdaMC) layer to recursive neural models. The basic idea is to use more than one composition functions and adaptively select them depending on the input vectors. We present a general framework to model each semantic composition as a distribution over these composition functions. The composition functions and parameters used for adaptive selection are learned jointly from data. We integrate AdaMC into existing recursive neural models and conduct extensive experiments on the Stanford Sentiment Treebank. The results illustrate that AdaMC significantly outperforms state-of-the-art sentiment classification methods. It helps push the best accuracy of sentence-level negative/positive classification from 85.4% up to 88.5%.
SenticNet 3: A Common and Common-Sense Knowledge Base for Cognition-Driven Sentiment Analysis
Cambria, Erik (Nanyang Technological University) | Olsher, Daniel (Carnegie Mellon University) | Rajagopal, Dheeraj (National University of Singapore)
SenticNet is a publicly available semantic and affective resource for concept-level sentiment analysis. Rather than using graph-mining and dimensionality-reduction techniques, SenticNet 3 makes use of "energy flows" to connect various parts of extended common and common-sense knowledge representations to one another. SenticNet 3 models nuanced semantics and sentics (that is, the conceptual and affective information associated with multi-word natural language expressions), representing information with a symbolic opacity of an intermediate nature between that of neural networks and typical symbolic systems.
Experiments on Visual Information Extraction with the Faces of Wikipedia
Hasan, Md. Kamrul (Polytechnique Montréal) | Pal, Christopher Joseph (Polytechnique Montréal)
We present a series of visual information extraction experiments using the Faces of Wikipedia database - a new resource that we release into the public domain for both recognition and extraction research containing over 50,000 identities and 60,000 disambiguated images of faces. We compare different techniques for automatically extracting the faces corresponding to the subject of a Wikipedia biography within the images appearing on the page. Our top performing approach is based on probabilistic graphical models and uses the text of Wikipedia pages, similarities of faces as well as various other features of the document, meta-data and image files. Our method resolves the problem jointly for all detected faces on a page. While our experiments focus on extracting faces from Wikipedia biographies, our approach is easily adapted to other types of documents and multiple documents. We focus on Wikipedia because the content is a Creative Commons resource and we provide our database to the community including registered faces, hand labeled and automated disambiguations, processed captions, meta data and evaluation protocols. Our best probabilistic extraction pipeline yields an expected average accuracy of 77\% compared to image only and text only baselines which yield 63\% and 66\% respectively.
Common and Common-Sense Knowledge Integration for Concept-Level Sentiment Analysis
Cambria, Erik (Massachusetts Institute of Technology) | Howard, Newton (Massachusetts Institute of Technology)
In the era of Big Data, knowledge integration is key for tasks such as social media aggregation, opinion mining, and cyber-issue detection. The integration of different kinds of knowledge coming from multiple sources, however, is often a problematic issue as it either requires a lot of manual effort in defining aggregation rules or suffers from noise generated by automatic integration techniques. In this work, we propose a method based on conceptual primitives for efficiently integrating pieces of knowledge coming from different common and common-sense resources, which we test in the field of concept-level sentiment analysis.
A Survey of Data Mining Techniques for Social Media Analysis
Adedoyin-Olowe, Mariam, Gaber, Mohamed Medhat, Stahl, Frederic
Social network has gained remarkable attention in the last decade. Accessing social network sites such as Twitter, Facebook LinkedIn and Google+ through the internet and the web 2.0 technologies has become more affordable. People are becoming more interested in and relying on social network for information, news and opinion of other users on diverse subject matters. The heavy reliance on social network sites causes them to generate massive data characterised by three computational issues namely; size, noise and dynamism. These issues often make social network data very complex to analyse manually, resulting in the pertinent use of computational means of analysing them. Data mining provides a wide range of techniques for detecting useful knowledge from massive datasets like trends, patterns and rules [44]. Data mining techniques are used for information retrieval, statistical modelling and machine learning. These techniques employ data pre-processing, data analysis, and data interpretation processes in the course of data analysis. This survey discusses different data mining techniques used in mining diverse aspects of the social network over decades going from the historical techniques to the up-to-date models, including our novel technique named TRCM. All the techniques covered in this survey are listed in the Table.1 including the tools employed as well as names of their authors.
EmotionWatch: Visualizing Fine-Grained Emotions in Event-Related Tweets
Kempter, Renato (Swiss Federal Institute of Technology Lausanne (EPFL)) | Sintsova, Valentina (Swiss Federal Institute of Technology Lausanne (EPFL)) | Musat, Claudiu (Swiss Federal Institute of Technology Lausanne (EPFL)) | Pu, Pearl (Swiss Federal Institute of Technology Lausanne (EPFL))
Spectators are increasingly using social platforms to express their opinions and share their emotions during big public events. Those reactions reveal the subjective perception of the event and extend its understanding. This has motivated us to develop a system to explore and visualize volume, patterns, and trends of user sentiments as they evolve over time. Previous work in sentiment analysis and opinion mining has addressed these issues. But the majority of them distinguish only two polarity categories, leaving a more detailed and insightful analysis to be desired. In this paper, we suggest using a fine-grained, multi-category emotion model to classify and visualize users' emotional reactions in public events. We describe EmotionWatch, a tool that constructs visual summaries of public emotions, and apply it to the 2012 Olympics as a test case. We report findings from a user study evaluating the usability of the tool and validating the emotion model. Results show that users prefer a more detailed inspection of public emotions over the simplified analysis. Despite its complexity, users were able to effectively grasp, understand, and interpret the emotional reactions using EmotionWatch. The same user study also pointed out few design improvements for the future development of analogous systems.
VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text
Hutto, C. J. (Georgia Institute of Technology) | Gilbert, Eric (Georgia Institute of Technology)
The inherent nature of social media content poses serious challenges to practical applications of sentiment analysis. We present VADER, a simple rule-based model for general sentiment analysis, and compare its effectiveness to eleven typical state-of-practice benchmarks including LIWC, ANEW, the General Inquirer, SentiWordNet, and machine learning oriented techniques relying on Naive Bayes, Maximum Entropy, and Support Vector Machine (SVM) algorithms. Using a combination of qualitative and quantitative methods, we first construct and empirically validate a gold-standard list of lexical features (along with their associated sentiment intensity measures) which are specifically attuned to sentiment in microblog-like contexts. We then combine these lexical features with consideration for five general rules that embody grammatical and syntactical conventions for expressing and emphasizing sentiment intensity. Interestingly, using our parsimonious rule-based model to assess the sentiment of tweets, we find that VADER outperforms individual human raters (F1 Classification Accuracy = 0.96 and 0.84, respectively), and generalizes more favorably across contexts than any of our benchmarks.