In this paper, we propose a new supervised topic model by incorporating the user and the item information. The proposed model can simultaneously utilize the textual topic and user-item factors for label prediction. We conduct prediction experiment with a public review dataset. The results demonstrate the advantages of our model. It shows clear improvement compared with traditional supervised topic model and recommendation method.
Probabilistic topic models have been widely used for sentiment analysis. However, most of existing topic methods only model the sentiment text, but do not consider the user, who expresses the sentiment, and the item, which the sentiment is expressed on. Since different users may use different sentiment expressions for different items, we argue that it is better to incorporate the user and item information into the topic model for sentiment analysis. In this paper, we propose a new Supervised User-Item based Topic model, called SUIT model, for sentiment analysis. It can simultaneously utilize the textual topic and latent user-item factors. Our proposed method uses the tensor outer product of text topic proportion vector, user latent factor and item latent factor to model the sentiment label generalization. Extensive experiments are conducted on two datasets: review dataset and microblog dataset. The results demonstrate the advantages of our model. It shows significant improvement compared with supervised topic models and collaborative filtering methods.
Song, Kaisong (Northeastern University) | Feng, Shi (Northeastern University) | Gao, Wei (Qatar Computing Research Institute) | Wang, Daling (Northeastern University) | Yu, Ge (Northeastern University) | Wong, Kam-Fai (The Chinese University of Hong Kong)
Sentiment expression in microblog posts often reflects user's specific individuality due to different language habit, personal character, opinion bias and so on. Existing sentiment classification algorithms largely ignore such latent personal distinctions among different microblog users. Meanwhile, sentiment data of microblogs are sparse for individual users, making it infeasible to learn effective personalized classifier. In this paper, we propose a novel, extensible personalized sentiment classification method based on a variant of latent factor model to capture personal sentiment variations by mapping users and posts into a low-dimensional factor space. We alleviate the sparsity of personal texts by decomposing the posts into words which are further represented by the weighted sentiment and topic units based on a set of syntactic units of words obtained from dependency parsing results. To strengthen the representation of users, we leverage users following relation to consolidate the individuality of a user fused from other users with similar interests. Results on real-world microblog datasets confirm that our method outperforms state-of-the-art baseline algorithms with large margins.
Li, Fangtao (Tsinghua University) | Liu, Nathan Nan (Hong Kong University of Science and Technology) | Jin, Hongwei (State Key Laboratory of Intelligent Technology and Systems) | Zhao, Kai (Hong Kong University of Science and Technology) | Yang, Qiang (Hong Kong University of Science and Technology) | Zhu, Xiaoyan (State Key Laboratory of Intelligent Technology and Systems)
Among sentiment analysis tasks, review rating prediction is more helpful than binary (positive and negative) classification, especially when the consumers want to compare two good products. Previous work has addressed this problem by extracting various features from the review text for learning a predictor. Since the same word may have different sentiment effects when used by different reviewers on different products, we argue that it is necessary to model such reviewer and product dependent effects in order to predict review ratings more accurately. In this paper, we propose a novel learning framework to incorporate reviewer and product information into the text based learner for rating prediction. The reviewer, product and text feature are modeled as a three-dimension tensor. The tensor factorization technique is employed to reduce the sparsity and complexity problems. The experiment results demonstrate the effectiveness of our model. We achieve significant improvement as compared with the state of the art methods, especially for the reviews with unpopular products and inactive reviewers.
With the development of Web 2.0, sentiment analysis has now become a popular research problem to tackle. Recently, topic models have been introduced for the simultaneous analysis for topics and the sentiment in a document. These studies, which jointly model topic and sentiment, take the advantage of the relationship between topics and sentiment, and are shown to be superior to traditional sentiment analysis tools. However, most of them make the assumption that, given the parameters, the sentiments of the words in the document are all independent. In our observation, in contrast, sentiments are expressed in a coherent way. The local conjunctive words, such as “and” or “but”, are often indicative of sentiment transitions. In this paper, we propose a major departure from the previous approaches by making two linked contributions. First, we assume that the sentiments are related to the topic in the document, and put forward a joint sentiment and topic model, i.e. Sentiment-LDA. Second, we observe that sentiments are dependent on local context. Thus, we further extend the Sentiment-LDA model to Dependency-Sentiment-LDA model by relaxing the sentiment independent assumption in Sentiment-LDA. The sentiments of words are viewed as a Markov chain in Dependency-Sentiment-LDA. Through experiments, we show that exploiting the sentiment dependency is clearly advantageous, and that the Dependency-Sentiment-LDA is an effective approach for sentiment analysis.