This paper presents experiments on subjectivity and polarity classifications of topic-and genre-independent blog posts, making novel use of a linguistic feature, verb class information, and of an online resource, the Wikipedia dictionary, for determining the polarity of adjectives. Each post from a blog is classified as objective, positive, or negative. Our method of determining the polarity of adjectives has an accuracy rate of 90.9%. Accuracy rates of two verb classes demonstrating polarity are 89.3% and 91.2%. Initial classifier results show blog-post accuracies with significant increases above the established baseline classification.
Subjectivity tagging is distinguishing sentences used to present opinions and evaluations from sentences used to objectively present factual information. There are numerous applications for which subjectivity tagging is relevant, including information extraction and information retrieval. This paper identifies strong clues of subjectivity using the results of a method for clustering words according to distributional similarity (Lin 1998), seeded by a small amount of detailed manual annotation. These features are then further refined with the addition of lexical semantic features of adjectives, specifically polarity and gradability (Hatzivassiloglou & McKeown 1997), which can be automatically learned from corpora. In 10-fold cross validation experiments, features based on both similarity clusters and the lexical semantic features are shown to have higher precision than features based on each alone.
In this paper, we present a series of semantic analyses of words in political blogs in the setting of categorization of two opposite political orientations: liberal vs. conservative. We classify nouns, verbs, adjectives and adverbs into semantic categories by using the General Inquirer dictionary. Then distributions of these categories and correlations among them are examined both within and between blogs of the two opposite political leanings. Results show that although words of certain categories tend to appear together while others do not within blogs of a political leaning, the semantic category distribution of words used by left-wing bloggers is very similar to those by right-wing bloggers, suggesting single words alone do not account for major difference between these two major categories of blogs. Lastly, by examining preliminary results of association rule mining of nouns, verbs, adjectives and adverbs in sentences, we posit that the similarity and/or difference between blogs of opposite political orientations can be detected by extracting opinion expressions around collocation of nouns and verbs (together with modifiers).
This paper describes a bootstrapping algorithm for acquiring a lexicon of subjective adjectives which minimizes the recourse to external resources (such as lexical databases, parsers, manual annotation work). The method only employs a corpus tagged with part-ofspeech information and a seed set of subjective adjectives. The list of candidate subjective adjectives is generated incrementally by looking at the head nouns they modify and computing their distribution-based semantic similarity (cosine) with respect to the seed set and its successive extensions. The advantages of a method using limited resources include the following: a) it can be used for languages other than English for which resources such as parsers and annotated corpora are not available, but a part-of-speech tagger is; b) it can be used for English as well when fast and low cost development is required in specific sub-domains of subjective language.
We present a method for classifying texts automatically, based on their subjective content. We apply a standard method for calculating semantic orientation (Turney 2002), and expand it by giving more prominence to certain parts of the text, where we believe most subjective content is concentrated. We also apply a linguistic classification of Appraisal and find that it could be helpful in distinguishing different types of subjective texts (e.g., movie reviews from consumer product reviews).