Location-based social sites, such as Foursquare or Yelp, are gaining increasing popularity. These sites allow users to check in at venues and leave a short commentary in the form of a micro-review. Micro-reviews are rich in content as they offer a distilled and concise account of user experience. In this paper we consider the problem of predicting the topic of a micro-review by a user who visits a new venue. Such a prediction can help users make informed decisions, and also help venue owners personalize users' experiences. However, topic modeling for micro-reviews is particularly difficult, due to their short and fragmented nature. We address this issue using pooling strategies, which aggregate micro-reviews at the venue or user level, and we propose novel probabilistic models based on Latent Dirichlet Allocation (LDA) for extracting the topics related to a user-venue pair. Our best topic model integrates influences from both venue inherent properties and user preferences, considering at the same the sentiment orientation of the users. Experimental results on real datasets demonstrate the superiority of this model compared to simpler models and previous work; they also show that venue-inherent properties have higher influences on the topics of micro-reviews.
Mason, Rebecca (Google, Inc.) | Gaska, Benjamin (University of Arizona) | Durme, Benjamin Van (Johns Hopkins University) | Choudhury, Pallavi (Microsoft Research) | Hart, Ted (Microsoft Research) | Dolan, Bill (Microsoft Research) | Toutanova, Kristina (Microsoft Research) | Mitchell, Margaret (Microsoft Research)
Mobile and location-based social media applications provide platforms for users to share brief opinions about products, venues, and services. These quickly typed opinions, or microreviews, are a valuable source of current sentiment on a wide variety of subjects. However, there is currently little research on how to mine this information to present it back to users in easily consumable way. In this paper, we introduce the task of microsummarization, which combines sentiment analysis, summarization, and entity recognition in order to surface key content to users. We explore unsupervised and supervised methods for this task, and find we can reliably extract relevant entities and the sentiment targeted towards them using crowdsourced labels as supervision. In an end-to-end evaluation, we find our best-performing system is vastly preferred by judges over a traditional extractive summarization approach. This work motivates an entirely new approach to summarization, incorporating both sentiment analysis and item extraction for modernized, at-a-glance presentation of public opinion.
Zhang, Wei (Tsinghua University and Tsinghua National Laboratory for Information Science and Technology) | Wang, Jianyong (Tsinghua University and Tsinghua National Laboratory for Information Science and Technology)
User-item connected documents, such as customer reviews for specific items in online shopping website and user tips in location-based social networks, have become more and more prevalent recently. Inferring the topic distributions of user-item connected documents is beneficial for many applications, including document classification and summarization of users and items. While many different topic models have been proposed for modeling multiple text, most of them cannot account for the dual role of user-item connected documents (each document is related to one user and one item simultaneously) in topic distribution generation process. In this paper, we propose a novel probabilistic topic model called Prior-based Dual Additive Latent Dirichlet Allocation (PDA-LDA). It addresses the dual role of each document by associating its Dirichlet prior for topic distribution with user and item topic factors, which leads to a document-level asymmetric Dirichlet prior. In the experiments, we evaluate PDA-LDA on several real datasets and the results demonstrate that our model is effective in comparison to several other models, including held-out perplexity on modeling text and document classification application.
This paper proposes a new HDP based online review rating regression model named Topic-Sentiment-Preference Regression Analysis (TSPRA). TSPRA combines topics (i.e. product aspects), word sentiment and user preference as regression factors, and is able to perform topic clustering, review rating prediction, sentiment analysis and what we invent as "critical aspect" analysis altogether in one framework. TSPRA extends sentiment approaches by integrating the key concept "user preference" in collaborative filtering (CF) models into consideration, while it is distinct from current CF models by decoupling "user preference" and "sentiment" as independent factors. Our experiments conducted on 22 Amazon datasets show overwhelming better performance in rating predication against a state-of-art model FLAME (2015) in terms of error, Pearson's Correlation and number of inverted pairs. For sentiment analysis, we compare the derived word sentiments against a public sentiment resource SenticNet3 and our sentiment estimations clearly make more sense in the context of online reviews. Last, as a result of the de-correlation of "user preference" from "sentiment", TSPRA is able to evaluate a new concept "critical aspects", defined as the product aspects seriously concerned by users but negatively commented in reviews. Improvement to such "critical aspects" could be most effective to enhance user experience.
With the development of Web 2.0, sentiment analysis has now become a popular research problem to tackle. Recently, topic models have been introduced for the simultaneous analysis for topics and the sentiment in a document. These studies, which jointly model topic and sentiment, take the advantage of the relationship between topics and sentiment, and are shown to be superior to traditional sentiment analysis tools. However, most of them make the assumption that, given the parameters, the sentiments of the words in the document are all independent. In our observation, in contrast, sentiments are expressed in a coherent way. The local conjunctive words, such as “and” or “but”, are often indicative of sentiment transitions. In this paper, we propose a major departure from the previous approaches by making two linked contributions. First, we assume that the sentiments are related to the topic in the document, and put forward a joint sentiment and topic model, i.e. Sentiment-LDA. Second, we observe that sentiments are dependent on local context. Thus, we further extend the Sentiment-LDA model to Dependency-Sentiment-LDA model by relaxing the sentiment independent assumption in Sentiment-LDA. The sentiments of words are viewed as a Markov chain in Dependency-Sentiment-LDA. Through experiments, we show that exploiting the sentiment dependency is clearly advantageous, and that the Dependency-Sentiment-LDA is an effective approach for sentiment analysis.