Goto

Collaborating Authors

 Information Extraction


Microblog Sentiment Classification with Contextual Knowledge Regularization

AAAI Conferences

Microblog sentiment classification is an important research topic which has wide applications in both academia and industry. Because microblog messages are short, noisy and contain masses of acronyms and informal words, microblog sentiment classification is a very challenging task. Fortunately, collectively the contextual information about these idiosyncratic words provide knowledge about their sentiment orientations. In this paper, we propose to use the microblogs' contextual knowledge mined from a large amount of unlabeled data to help improve microblog sentiment classification. We define two kinds of contextual knowledge: word-word association and word-sentiment association. The contextual knowledge is formulated as regularization terms in supervised learning algorithms. An efficient optimization procedure is proposed to learn the model. Experimental results on benchmark datasets show that our method can consistently and significantly outperform the state-of-the-art methods.


Exploring Social Context for Topic Identification in Short and Noisy Texts

AAAI Conferences

With the pervasion of social media, topic identification in short texts attracts increasing attention in  recent years. However, in nature the texts of social media are short and noisy, and the structures are sparse and dynamic, resulting in difficulty to identify topic categories exactly from online social media. Inspired by social science findings that preference consistency and social contagion are observed in social media, we investigate topic identification in short and noisy texts by exploring social context from the perspective of social sciences. In particular, we present a mathematical optimization formulation that incorporates the preference consistency and social contagion theories into a supervised learning method, and conduct feature selection to tackle short and noisy texts in social media, which result in a Sociological framework for Topic Identification (STI). Experimental results on real-world datasets from Twitter and Citation Network demonstrate the effectiveness of the proposed framework. Further experiments are conducted to understand the importance of social context in topic identification.


Representation Learning for Aspect Category Detection in Online Reviews

AAAI Conferences

User-generated reviews are valuable resources for decision making. Identifying the aspect categories discussed in a given review sentence (e.g., “food” and “service” in restaurant reviews) is an important task of sentiment analysis and opinion mining. Given a predefined aspect category set, most previous researches leverage hand-crafted features and a classification algorithm to accomplish the task. The crucial step to achieve better performance is feature engineering which consumes much human effort and may be unstable when the product domain changes. In this paper, we propose a representation learning approach to automatically learn useful features for aspect category detection. Specifically, a semi-supervised word embedding algorithm is first proposed to obtain continuous word representations on a large set of reviews with noisy labels. Afterwards, we propose to generate deeper and hybrid features through neural networks stacked on the word vectors. A logistic regression classifier is finally trained with the hybrid features to predict the aspect category. The experiments are carried out on a benchmark dataset released by SemEval-2014. Our approach achieves the state-of-the-art performance and outperforms the best participating team as well as a few strong baselines.


Robust Image Sentiment Analysis Using Progressively Trained and Domain Transferred Deep Networks

AAAI Conferences

Sentiment analysis of online user generated content is important for many social media analytics tasks. Researchers have largely relied on textual sentiment analysis to develop systems to predict political elections, measure economic indicators, and so on. Recently, social media users are increasingly using images and videos to express their opinions and share their experiences. Sentiment analysis of such large scale visual content can help better extract user sentiments toward events or topics, such as those in image tweets, so that prediction of sentiment from visual content is complementary to textual sentiment analysis. Motivated by the needs in leveraging large scale yet noisy training data to solve the extremely challenging problem of image sentiment analysis, we employ Convolutional Neural Networks (CNN). We first design a suitable CNN architecture for image sentiment analysis. We obtain half a million training samples by using a baseline sentiment algorithm to label Flickr images. To make use of such noisy machine labeled data, we employ a progressive strategy to fine-tune the deep network. Furthermore, we improve the performance on Twitter images by inducing domain transfer with a small number of manually labeled Twitter images. We have conducted extensive experiments on manually labeled Twitter images. The results show that the proposed CNN can achieve better performance in image sentiment analysis than competing algorithms.


The Hurricane Sandy Twitter Corpus

AAAI Conferences

The growing use of social media has made it a critical component of disaster response and recovery efforts. Both in terms of preparedness and response, public health officials and first responders have turned to automated tools to assist with organizing and visualizing large streams of social media. In turn, this has spurred new research into algorithms for information extraction, event detection and organization, and information visualization. One challenge of these efforts has been the lack of a common corpus for disaster response on which researchers can compare and contrast their work. This paper describes the Hurricane Sandy Twitter Corpus: 6.5 million geotagged Twitter posts from the geographic area and time period of the 2012 Hurricane Sandy.


Toward Social Media Opinion Mining for Sustainability Research

AAAI Conferences

We propose to introduce social media opinion mining research into the field of computational sustainability. Opinion mining from social media can be a faster and less expensive alternative to traditional survey and polling, on which many sustainability research are based. We describe a framework for such analysis, examine the challenges in our proposed framework and current status of research on those challenges. We also propose some possible research directions for tackling these challenges.


Extraction of (Key,Value) Pairs from Unstructured Ads

AAAI Conferences

In this paper, we focus on the problem of extracting structured labeled data from short unstructured ad-postings from online sources like Craigslist, where ads are posted on various topics, such as job postings, rentals, car sales etc. A fundamental challenge in addressing this problem is that most ad-postings are highly unstructured, short-text postings written in an informal manner with no inherent grammar or well-defined dictionary. In this paper, we propose unsupervised and supervised algorithms for extracting structured data from unstructured ads in the form of (key, value) pairs where the keys naturally represent topic-specific features in the ads. The unsupervised algorithm is centered around building an affinity graph, using the words from a topic-specific corpus of such ads where the edge weights represent affinities between words; the (key, value) extraction algorithm identifies specific groups of words in the affinity graph corresponding to different classes of key attributes. The supervised algorithm uses a Conditional Random Field based training algorithm to identify specific structured (key, value) pairs based on pre-defined topic-specific structural data representations of ads. Based on a corpus of car and apartment ad-postings from Craigslist, the unsupervised algorithm reported an accuracy of 67.74% and 68.74% for car and apartment ads respectively. The supervised algorithm demonstrated an improved performance with accuracies of 74.07% and 72.59% respectively.


Domain-Specific Sentiment Classification for Games-Related Tweets

AAAI Conferences

Sentiment classification provides information about the author's feeling toward a topic through the use of expressive words. However, words indicative of a particular sentiment class can be domain-specific. We train a text classifier for Twitter data related to games using labels inferred from emoticons. Our classifier is able to differentiate between positive and negative sentiment tweets labeled by emoticons with 75.1% accuracy. Additionally, we test the classifier on human-labeled examples with the additional case of neutral or ambiguous sentiment. Finally, we have made the data available to the community for further use and analysis.


The Role of Emotions in Propagating Brands in Social Networks

arXiv.org Machine Learning

A key aspect of word of mouth marketing are emotions. Emotions in texts help propagating messages in conventional advertising. In word of mouth scenarios, emotions help to engage consumers and incite to propagate the message further. While the function of emotions in offline marketing in general and word of mouth marketing in particular is rather well understood, online marketing can only offer a limited view on the function of emotions. In this contribution we seek to close this gap. We therefore investigate how emotions function in social media. To do so, we collected more than 30,000 brand marketing messages from the Google+ social networking site. Using state of the art computational linguistics classifiers, we compute the sentiment of these messages. Starting out with Poisson regression-based baseline models, we seek to replicate earlier findings using this large data set. We extend upon earlier research by computing multi-level mixed effects models that compare the function of emotions across different industries. We find that while the well known notion of activating emotions propagating messages holds in general for our data as well. But there are significant differences between the observed industries.


Sentiment Analysis of Short Informal Texts

Journal of Artificial Intelligence Research

We describe a state-of-the-art sentiment analysis system that detects (a) the sentiment of short informal textual messages such as tweets and SMS (message-level task) and (b) the sentiment of a word or a phrase within a message (term-level task). The system is based on a supervised statistical text classification approach leveraging a variety of surface-form, semantic, and sentiment features. The sentiment features are primarily derived from novel high-coverage tweet-specific sentiment lexicons. These lexicons are automatically generated from tweets with sentiment-word hashtags and from tweets with emoticons. To adequately capture the sentiment of words in negated contexts, a separate sentiment lexicon is generated for negated words. The system ranked first in the SemEval-2013 shared task `Sentiment Analysis in Twitter' (Task 2), obtaining an F-score of 69.02 in the message-level task and 88.93 in the term-level task. Post-competition improvements boost the performance to an F-score of 70.45 (message-level task) and 89.50 (term-level task). The system also obtains state-of-the-art performance on two additional datasets: the SemEval-2013 SMS test set and a corpus of movie review excerpts. The ablation experiments demonstrate that the use of the automatically generated lexicons results in performance gains of up to 6.5 absolute percentage points.