User generated content is extremely valuable for mining market intelligence because it is unsolicited. We study the problem of analyzing users' sentiment and opinion in their blog, message board, etc. posts with respect to topics expressed as a search query. In the scenario we consider the matches of the search query terms are expanded through coreference and meronymy to produce a set of mentions. The mentions are contextually evaluated for sentiment and their scores are aggregated (using a data structure we introduce call the sentiment propagation graph) to produce an aggregate score for the input entity. An extremely crucial part in the contextual evaluation of individual mentions is finding which sentiment expressions are semantically related to (target) which mentions --- this is the focus of our paper. We present an approach where potential target mentions for a sentiment expression are ranked using supervised machine learning (Support Vector Machines) where the main features are the syntactic configurations (typed dependency paths) connecting the sentiment expression and the mention. We have created a large English corpus of product discussions blogs annotated with semantic types of mentions, coreference, meronymy and sentiment targets. The corpus proves that coreference and meronymy are not marginal phenomena but are really central to determining the overall sentiment for the top-level entity. We evaluate a number of techniques for sentiment targeting and present results which we believe push the current state-of-the-art.
A large proportion of audiences read news online, often accessing news articles through social media like Facebook or Twitter. A distinguishing characteristic of news on social media is that the most prominent (and often the only visible) part of the news article is the headline. We investigate the impact of headline characteristics, including journalistic concepts of news values and linguistic style, on the article's social media popularity. Using a large corpus of headlines from The Guardian and New York Times we derive these features automatically and correlate with social media popularity on Twitter and Facebook. We found most of them to have a significant effect and that their impact differs between the two social media and between news outlets. Further investigation with a crowdsourced study confirms that news values and style influence the audiences' decisions to click on a headline.
Mining opinions and sentiment from social networking sites is a popular application for social media systems. Common approaches use a machine learning system with a bag of words feature set. We present Delta TFIDF, an intuitive general purpose technique to efficiently weight word scores before classification. Delta TFIDF is easy to compute, implement, and understand. We use Support Vector Machines to show that Delta TFIDF significantly improves accuracy for sentiment analysis problems using three well known data sets.
Tanev, Hristo (Joint Research Centre, European Commission) | Ehrmann, Maud (Joint Research Centre, European Commission) | Piskorski, Jakub (Frontex) | Zavarella, Vanni (Joint Research Centre, European Commission)
We describe a simple IR approach for linking news about events, detected by an event extraction system, to messages from Twitter (tweets). In particular, we explore several methods for creating event-specific queries for Twitter and provide a quantitative and qualitative evaluation of the relevance and usefulness of the information obtained from the tweets. We showed that methods based on utilization of word co-occurrence clustering, domain-specific keywords and named entity recognition improve the performance with respect to a basic approach.
Today, users are reading the news through social platforms. These platforms are built to facilitate crowd engagement, but not necessarily disseminate useful news to inform the masses. Hence, the news that is highly engaged with may not be the news that best informs. While predicting news popularity has been well studied, it has not been studied in the context of crowd manipulations. In this paper, we provide some preliminary results to a longer term project on crowd and platform manipulations of news and news popularity. In particular, we choose to study known features for predicting news popularity and how those features may change on reddit.com, asocial platform used commonly for news aggregation. Along with this, we explore ways in which users can alter the perception of news through changing the title of an article. We find that news on Reddit is predictable using previously studied sentiment and content features and that posts with titles changed by Reddit users tend to be more popular than posts with the original article title.