One approach to assessing overall opinion polarity (OvOP) of reviews, a concept defined in this paper, is the use of supervised machine learning mechanisms. In this paper, the impact of lexical filtering, applied to reviews, on the accuracy of two statistical classifiers (Naive Bayes and Markov Model) with respect to OvOP identification is observed. Two kinds of lexical filters, one based on hypernymy as provided by Word-Net (Fellbaum 1998), and one handcrafted filter based on part-of-speech (POS) tags, are evaluated. A ranking criterion based on a function of the probability of having positive or negative polarity is introduced and verified as being capable of achieving 100% accuracy with 10% recall. Movie reviews are used for training and evaluation of each statistical classifier, achieving 80% accuracy.
In this paper, we propose to study the characteristics for analyzing subjective content in documents. For that purpose, we present and evaluate a novel method based on level of abstraction of nouns. By comparing state-of-the-art features and the level of abstraction of nouns between three annotated corpora and texts downloaded from Wikipedia and Web Blogs, we show that, building data sets for the classification of opinionated texts can be done automatically from the web, at the document level. Moreover, we present accuracy levels within domains of 96.5% and across domains of 74.5%.
We present a method for classifying texts automatically, based on their subjective content. We apply a standard method for calculating semantic orientation (Turney 2002), and expand it by giving more prominence to certain parts of the text, where we believe most subjective content is concentrated. We also apply a linguistic classification of Appraisal and find that it could be helpful in distinguishing different types of subjective texts (e.g., movie reviews from consumer product reviews).
This paper systematically exploited various lexical features for opinion analysis on blog data using a statistical learning framework. Our experimental results using the TREC Blog track data show that all the features we explored effectively represent opinion expressions, and different classification strategies have a significant impact on opinion classification performance. We also present results when combining opinion analysis with the retrieval component for the task of retrieving relevant and opinionated blogs. Compared with the best results in the TREC evaluation, our system achieves reasonable performance, but does not rely on much human knowledge or deep level linguistic analysis.
The Web has become an excellent source for gathering consumer opinions. There are now numerous Web sources containing such opinions, e.g., product reviews, forums, discussion groups, and blogs. Techniques are now being developed to exploit these sources to help organizations and individuals to gain such important information easily and quickly. In this paper, we first discuss several aspects of the problem in the AI context, and then present some results of our existing work published in KDD-04 and WWW-05.