Domain Specific Affective Classification of Documents

AAAI Conferences

In this paper, we describe a set of techniques that can be used to classify weblogs (blogs) by emotional content. Instead of using a general purpose emotional classification strategy, our technique aims to generate domain specific sentiment classifiers that can be used to determine the emotional state of weblogs in that domain.


Targeting Sentiment Expressions through Supervised Ranking of Linguistic Configurations

AAAI Conferences

User generated content is extremely valuable for mining market intelligence because it is unsolicited. We study the problem of analyzing users' sentiment and opinion in their blog, message board, etc. posts with respect to topics expressed as a search query.  In the scenario we consider the matches of the search query terms are expanded through coreference and meronymy to produce a set of mentions.  The mentions are contextually evaluated for sentiment and their scores are aggregated (using a data structure we introduce call the sentiment propagation graph) to produce an aggregate score for the input entity.  An extremely crucial part in the contextual evaluation of individual mentions is finding which sentiment expressions are semantically related to (target) which mentions --- this is the focus of our paper.  We present an approach where potential target mentions for a sentiment expression are ranked using supervised machine learning (Support Vector Machines) where the main features are the syntactic configurations (typed dependency paths) connecting the sentiment expression and the mention.  We have created a large English corpus of product discussions blogs annotated with semantic types of mentions, coreference, meronymy and sentiment targets.  The corpus proves that coreference and meronymy are not marginal phenomena but are really central to determining the overall sentiment for the top-level entity.  We evaluate a number of techniques for sentiment targeting and present results which we believe push the current state-of-the-art.


Enhancing Event Descriptions through Twitter Mining

AAAI Conferences

We describe a simple IR approach for linking news about events, detected by an event extraction system, to messages from Twitter (tweets). In particular, we explore several methods for creating event-specific queries for Twitter and provide a quantitative and qualitative evaluation of the relevance and usefulness of the information obtained from the tweets. We showed that methods based on utilization of word co-occurrence clustering, domain-specific keywords and named entity recognition improve the performance with respect to a basic approach.


The Impact of News Values and Linguistic Style on the Popularity of Headlines On Twitter and Facebook

AAAI Conferences

A large proportion of audiences read news online, often accessing news articles through social media like Facebook or Twitter. A distinguishing characteristic of news on social media is that the most prominent (and often the only visible) part of the news article is the headline. We investigate the impact of headline characteristics, including journalistic concepts of news values and linguistic style, on the article's social media popularity. Using a large corpus of headlines from The Guardian and New York Times we derive these features automatically and correlate with social media popularity on Twitter and Facebook. We found most of them to have a significant effect and that their impact differs between the two social media and between news outlets. Further investigation with a crowdsourced study confirms that news values and style influence the audiences' decisions to click on a headline.


Sentiment Analysis of Movie Reviews (2): word2vec

@machinelearnbot

This is the continuation of my mini-series on sentiment analysis of movie reviews, which originally appeared on recurrentnull.wordpress.com. Last time, we had a look at how well classical bag-of-words models worked for classification of the Stanford collection of IMDB reviews. As it turned out, the "winner" was Logistic Regression, using both unigrams and bigrams for classification. The best classification accuracy obtained was .89 So, bag-of-words models may be surprisingly successful, but they are limited in what they can do.