Gao, Dehong (The Hong Kong Polytechnic University) | Wei, Furu (Microsoft Research Asia, Beijing) | Li, Wenjie (The Hong Kong Polytechnic University) | Liu, Xiaohua (Microsoft Research Asia, Beijing) | Zhou, Ming (Microsoft Research Asia, Beijing)
In this paper, we address the issue of bilingual sentiment lexicon learning(BSLL) which aims to automatically and simultaneously generate sentiment words for two languages. The underlying motivation is that sentiment information from two languages can perform iterative mutual-teaching in the learning procedure. We propose to develop two classifiers to determine the sentiment polarities of words under a co-training framework, which makes full use of the two-view sentiment information from the two languages. The word alignment derived from the parallel corpus is leveraged to design effective features and to bridge the learning of the two classifiers. The experimental results on English and Chinese languages show the effectiveness of our approach in BSLL.
With the development of Web 2.0, sentiment analysis has now become a popular research problem to tackle. Recently, topic models have been introduced for the simultaneous analysis for topics and the sentiment in a document. These studies, which jointly model topic and sentiment, take the advantage of the relationship between topics and sentiment, and are shown to be superior to traditional sentiment analysis tools. However, most of them make the assumption that, given the parameters, the sentiments of the words in the document are all independent. In our observation, in contrast, sentiments are expressed in a coherent way. The local conjunctive words, such as “and” or “but”, are often indicative of sentiment transitions. In this paper, we propose a major departure from the previous approaches by making two linked contributions. First, we assume that the sentiments are related to the topic in the document, and put forward a joint sentiment and topic model, i.e. Sentiment-LDA. Second, we observe that sentiments are dependent on local context. Thus, we further extend the Sentiment-LDA model to Dependency-Sentiment-LDA model by relaxing the sentiment independent assumption in Sentiment-LDA. The sentiments of words are viewed as a Markov chain in Dependency-Sentiment-LDA. Through experiments, we show that exploiting the sentiment dependency is clearly advantageous, and that the Dependency-Sentiment-LDA is an effective approach for sentiment analysis.
Microblog sentiment classification is an important research topic which has wide applications in both academia and industry. Because microblog messages are short, noisy and contain masses of acronyms and informal words, microblog sentiment classification is a very challenging task. Fortunately, collectively the contextual information about these idiosyncratic words provide knowledge about their sentiment orientations. In this paper, we propose to use the microblogs' contextual knowledge mined from a large amount of unlabeled data to help improve microblog sentiment classification. We define two kinds of contextual knowledge: word-word association and word-sentiment association. The contextual knowledge is formulated as regularization terms in supervised learning algorithms. An efficient optimization procedure is proposed to learn the model. Experimental results on benchmark datasets show that our method can consistently and significantly outperform the state-of-the-art methods.
The AFINN lexicon is perhaps one of the simplest and most popular lexicons that can be used extensively for sentiment analysis. The current version of the lexicon is AFINN-en-165. You can find this lexicon at the author's official GitHub repository. The author has also created a nice wrapper library on top of this in Python called afinn, which we will be using for our analysis. Let's look at some visualisations now.
The word soft may evoke positive connotations of warmth and cuddliness in many contexts, but calling a hockey player soft would be an insult. If you were to say something was terrific in the 1800s, this would probably imply that it was terrifying and awe-inspiring; today, terrific basically just implies that something is (pretty) good.