User generated content is extremely valuable for mining market intelligence because it is unsolicited. We study the problem of analyzing users' sentiment and opinion in their blog, message board, etc. posts with respect to topics expressed as a search query. In the scenario we consider the matches of the search query terms are expanded through coreference and meronymy to produce a set of mentions. The mentions are contextually evaluated for sentiment and their scores are aggregated (using a data structure we introduce call the sentiment propagation graph) to produce an aggregate score for the input entity. An extremely crucial part in the contextual evaluation of individual mentions is finding which sentiment expressions are semantically related to (target) which mentions --- this is the focus of our paper. We present an approach where potential target mentions for a sentiment expression are ranked using supervised machine learning (Support Vector Machines) where the main features are the syntactic configurations (typed dependency paths) connecting the sentiment expression and the mention. We have created a large English corpus of product discussions blogs annotated with semantic types of mentions, coreference, meronymy and sentiment targets. The corpus proves that coreference and meronymy are not marginal phenomena but are really central to determining the overall sentiment for the top-level entity. We evaluate a number of techniques for sentiment targeting and present results which we believe push the current state-of-the-art.
Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank Semantic word spaces have been very useful but cannot express the meaning of longer phrases in a principled way. Further progress towards understanding compositionality in tasks such as sentiment detection requires richer supervised training and evaluation resources and more powerful models of composition. To remedy this, we introduce a Sentiment Treebank. It includes fine grained sentiment labels for 215,154 phrases in the parse trees of 11,855 sentences and presents new challenges for sentiment compositionality. To address them, we introduce the Recursive Neural Tensor Network.
Political discourse in the United States is getting increasingly polarized. This polarization frequently causes different communities to react very differently to the same news events. Political blogs as a form of social media provide an unique insight into this phenomenon. We present a multitarget, semisupervised latent variable model, MCR-LDA to model this process by analyzing political blogs posts and their comment sections from different political communities jointly to predict the degree of polarization that news topics cause. Inspecting the model after inference reveals topics and the degree to which it triggers polarization. In this approach, community responses to news topics are observed using sentiment polarity and comment volume which serves as a proxy for the level of interest in the topic. In this context, we also present computational methods to assign sentiment polarity to the comments which serve as targets for latent variable models that predict the polarity based on the topics in the blog content. Our results show that the joint modeling of communities with different political beliefs using MCR-LDA does not sacrifice accuracy in sentiment polarity prediction when compared to approaches that are tailored to specific communities and additionally provides a view of the polarization in responses from the different communities.
We examine the query planning problem in information integration systems in the presence of sources that contain disjunctive information. We show that datalog, the language of choice for representing query plans in information integration systems, is not sufficiently expressive in this case. We prove that disjunctive datalog with inequality is sufficiently expressive, and present a construction of query plans that are guaranteed to extract all available information from disjunctive sources. 1 Introduction We examine the query planning problem in information integration systems in the presence of sources that contain disjunctive information. The query planning problem in such systems can be formally stated as the problem of answering queries using views (Levy et al. 1995; Ullman 1997; Duschka & Genesereth 1997a): View definitions describe the information stored by sources, and query planning requires to rewrite a query into one that only uses these views. In this paper we are going to extend the algorithm for answermg queries using conjunctive views that was introduced in (Duschka& Genesereth 1997a) to also be able to handle disjunction in the view definitions.