Yasavur, Ugan (Florida International University) | Travieso, Jorge (Florida International University) | Lisetti, Christine (Florida International University) | Rishe, Naphtali David (Florida International University)
There is an increasing interest for valence and emotion sensing using a variety of signals. Text, as a communication channel, gathers a substantial amount of interest for recognizing its underlying sentiment (valence or polarity), affect or emotion (e.g. happy, sadness). We consider recognizing the valence of a sentence as a prior task to emotion sensing. In this article, we discuss our approach to classify sentences in terms of emotional valence. Our supervised system performs syntactic and semantic analysis for feature extraction. It processes the interactions between words in sentences by using dependency parse trees, and it can decide the current polarity of named-entities based on on-the-fly topic modeling. We compared 3 rule-based approaches and two supervised approaches (i.e. Naive Bayes and Maximum Entropy). We trained and tested our system using the SemEval-2007 affective text dataset, which contains news headlines extracted from news websites. Our results show that our systems outperform the systems demonstrated in SemEval-2007.
Sentiment mining is a computational approach used to identify expressions made about topics within a span of text. The blogosphere is a particularly useful corpus for sentiment mining because bloggers express a wide variety of opinions and sentiments in their online journals. Previous works on sentiment identification and extraction have been primarily focused on using machine-learning methods to extract sentiment patterns. Annotating text corpuses, however, is a time-consuming process. In this paper, we present a streamlined approach to extract sentiments from untagged text. We use heuristic models to quickly identify sentiment expressions and target subjects. This is an enabling approach to the rapid identification and extraction of expressions about topics.
User generated content is extremely valuable for mining market intelligence because it is unsolicited. We study the problem of analyzing users' sentiment and opinion in their blog, message board, etc. posts with respect to topics expressed as a search query. In the scenario we consider the matches of the search query terms are expanded through coreference and meronymy to produce a set of mentions. The mentions are contextually evaluated for sentiment and their scores are aggregated (using a data structure we introduce call the sentiment propagation graph) to produce an aggregate score for the input entity. An extremely crucial part in the contextual evaluation of individual mentions is finding which sentiment expressions are semantically related to (target) which mentions --- this is the focus of our paper. We present an approach where potential target mentions for a sentiment expression are ranked using supervised machine learning (Support Vector Machines) where the main features are the syntactic configurations (typed dependency paths) connecting the sentiment expression and the mention. We have created a large English corpus of product discussions blogs annotated with semantic types of mentions, coreference, meronymy and sentiment targets. The corpus proves that coreference and meronymy are not marginal phenomena but are really central to determining the overall sentiment for the top-level entity. We evaluate a number of techniques for sentiment targeting and present results which we believe push the current state-of-the-art.
Sentiment Classification (SC) is about assigning a positive, negative or neutral label to a piece of text based on its overall opinion. This paper describes our in-progress work on extracting the meaning of words for SC. In particular, we investigate the utility of sense-level polarity information for SC. We first show that methods based on common classification features are not robust and their performance varies widely across different domains. We then show that sense-level polarity information features can significantly improve the performance of SC. We use datasets in different domains to study the robustness of the designated features. Our preliminary results show that the most common sense of the words result in the most robust results across different domains. In addition our observation shows that the sense-level polarity information is useful for producing a set of high-quality seed words which can be used for further improvement of SC task.
This paper focusses on the main issues related to the development of a corpus for opinion and sentiment analysis, with a special attention to irony, and presents as a case study Senti-TUT, a project for Italian aimed at investigating sentiment and irony in social media. We present the Senti-TUT corpus, a collection of texts from Twitter annotated with sentiment polarity. We describe the dataset, the annotation, the methodologies applied and our investigations on two important features of irony: polarity reversing and emotion expressions.