Due to the vast amount of user-generated content in the emerging Web 2.0, there is a growing need for computational processing of sentiment analysis in documents. Most of the current research in this field is devoted to product reviews from websites. Microblogs and social networks pose even a greater challenge to sentiment classification. However, especially marketing and political campaigns leverage from opinions expressed on Twitter or other social communication platforms. The objects of interest in this paper are the presidential candidates of the Republican Party in the USA and their campaign topics. In this paper we introduce the combination of the noun phrases’ frequency and their PMI measure as constraint on aspect extraction. This compensates for sparse phrases receiving a higher score than those composed of high-frequency words. Evaluation shows that the meronymy relationship between politicians and their topics holds and improves accuracy of aspect extraction.
In this paper, we present a simplified shallow semantic parsing approach to extracting opinion targets. This is done by formulating opinion target extraction (OTE) as a shallow semantic parsing problem with the opinion expression as the predicate and the corresponding targets as its arguments. In principle, our parsing approach to OTE differs from the state-of-the-art sequence labeling one in two aspects. First, we model OTE from parse tree level, where abundant structured syntactic information is available for use, instead of word sequence level, where only lexical information is available. Second, we focus on determining whether a constituent, rather than a word, is an opinion target or not, via a simplified shallow semantic parsing framework. Evaluation on two datasets shows that structured syntactic information plays a critical role in capturing the domination relationship between an opinion expression and its targets. It also shows that our parsing approach much outperforms the state-of-the-art sequence labeling one.
Chen, Lu (Wright State University) | Wang, Wenbo (Wright State University) | Nagarajan, Meenakshi (IBM Almaden Research Center) | Wang, Shaojun (Wright State University) | Sheth, Amit P. (Wright State University)
The problem of automatic extraction of sentiment expressions from informal text, as in microblogs such as tweets is a recent area of investigation. Compared to formal text, such as in product reviews or news articles, one of the key challenges lies in the wide diversity and informal nature of sentiment expressions that cannot be trivially enumerated or captured using predefined lexical patterns. In this work, we present an optimization-based approach to automatically extract sentiment expressions for a given target (e.g., movie, or person) from a corpus of unlabeled tweets. Specifically, we make three contributions: (i) we recognize a diverse and richer set of sentiment-bearing expressions in tweets, including formal and slang words/phrases, not limited to pre-specified syntactic patterns; (ii) instead of associating sentiment with an entire tweet, we assess the target-dependent polarity of each sentiment expression. The polarity of sentiment expression is determined by the nature of its target; (iii) we provide a novel formulation of assigning polarity to a sentiment expression as a constrained optimization problem over the tweet corpus. Experiments conducted on two domains, tweets mentioning movie and person entities, show that our approach improves accuracy in comparison with several baseline methods, and that the improvement becomes more prominent with increasing corpus sizes.
Though polarity classification has been extensively explored at document level, there has been little work investigating feature design at sentence level. Due to the small number of words within a sentence, polarity classification at sentence level differs substantially from document-level classification in that resulting bag-of-words feature vectors tend to be very sparse resulting in a lower classification accuracy. In this paper, we show that performance can be improved by adding features specifically designed for sentence-level polarity classification. We consider both explicit polarity information and various linguistic features. A great proportion of the improvement that can be obtained by using polarity information can also be achieved by using a set of simple domain-independent linguistic features.
Sentiment mining is a computational approach used to identify expressions made about topics within a span of text. The blogosphere is a particularly useful corpus for sentiment mining because bloggers express a wide variety of opinions and sentiments in their online journals. Previous works on sentiment identification and extraction have been primarily focused on using machine-learning methods to extract sentiment patterns. Annotating text corpuses, however, is a time-consuming process. In this paper, we present a streamlined approach to extract sentiments from untagged text. We use heuristic models to quickly identify sentiment expressions and target subjects. This is an enabling approach to the rapid identification and extraction of expressions about topics.