Goto

Collaborating Authors

 Information Extraction


Domain Adaptation in Sentiment Analysis of Twitter

AAAI Conferences

This paper focuses on performing Sentiment Analysis of Twitter by adapting data from other domains, commonly referred to as Domain Adaptation. While we show that Domain Adaptation is useful in predicting sentiments, we propose different techniques to select an out-of-domain data source that would aid in Sentiment Analysis. Additionally, we suggest two iterative algorithms based on Expectation-Maximization (EM) and Rocchio SVM that filter noisy data during adaptation and train only on valid data. Finally, we explore a couple of metrics, Mutual Information and Cosine distance to measure similarity between different domains of data. We use Twitter and Blippr as data sources and perform binary sentiment (positive and negative sentiments) classification.


The Stock Sonar โ€” Sentiment Analysis of Stocks Based on a Hybrid Approach

AAAI Conferences

The Stock Sonar (TSS) is a stock sentiment analysis application based on a novel hybrid approach. While previous work focused on document level sentiment classification, or extracted only generic sentiment at the phrase level, TSS integrates sentiment dictionaries, phrase-level compositional patterns, and predicate-level semantic events. TSS generates precise in text sentiment tagging as well as sentiment-oriented event summaries for a given stock, which are also aggregated into sentiment scores. Hence, TSS allows investors to get the essence of thousands of articles every day and may help them to make timely, informed trading decisions. The extracted sentiment is also shown to improve the accuracy of an existing document-level sentiment classifier.


Transfer Learning for Multiple-Domain Sentiment Analysis โ€” Identifying Domain Dependent/Independent Word Polarity

AAAI Conferences

Sentiment analysis is the task of determining the attitude (positive or negative) of documents. While the polarity of words in the documents is informative for this task, polarity of some words cannot be determined without domain knowledge. Detecting word polarity thus poses a challenge for multiple-domain sentiment analysis. Previous approaches tackle this problem with transfer learning techniques, but they cannot handle multiple source domains and multiple target domains. This paper proposes a novel Bayesian probabilistic model to handle multiple source and multiple target domains. In this model, each word is associated with three factors: Domain label, domain dependence/independence and word polarity. We derive an efficient algorithm using Gibbs sampling for inferring the parameters of the model, from both labeled and unlabeled texts. Using real data, we demonstrate the effectiveness of our model in a document polarity classification task compared with a method not considering the differences between domains. Moreover our method can also tell whether each word's polarity is domain-dependent or domain-independent. This feature allows us to construct a word polarity dictionary for each domain.


Identifying Evaluative Sentences in Online Discussions

AAAI Conferences

Much of opinion mining research focuses on product reviews because reviews are opinion-rich and contain little irrelevant information. However, this cannot be said about online discussions and comments. In such postings, the discussions can get highly emotional and heated with many emotional statements, and even personal attacks. As a result, many of the postings and sentences do not express positive or negative opinions about the topic being discussed. To find peopleโ€™s opinions on a topic and its different aspects, which we call evaluative opinions, those irrelevant sentences should be removed. The goal of this research is thus to identify evaluative opinion sentences. A novel unsupervised approach is proposed to solve the problem, and our experimental results show that it performs well.


Learning to Identify Review Spam

AAAI Conferences

In the past few years, sentiment analysis and opinion mining becomes a popular and important task. These studies all assume that their opinion resources are real and trustful. However, they may encounter the faked opinion or opinion spam problem. In this paper, we study this issue in the context of our product review mining system. On product review site, people may write faked reviews, called review spam, to promote their products, or defame their competitors' products. It is important to identify and filter out the review spam. Previous work only focuses on some heuristic rules, such as helpfulness voting, or rating deviation, which limits the performance of this task. In this paper, we exploit machine learning methods to identify review spam. Toward the end, we manually build a spam collection from our crawled reviews. We first analyze the effect of various features in spam identification. We also observe that the review spammer consistently writes spam. This provides us another view to identify review spam: we can identify if the author of the review is spammer. Based on this observation, we provide a two-view semi-supervised method, co-training, to exploit the large amount of unlabeled data. The experiment results show that our proposed method is effective. Our designed machine learning methods achieve significant improvements in comparison to the heuristic baselines.


Semi-Supervised Learning for Imbalanced Sentiment Classification

AAAI Conferences

Trained on the imbalanced labeled data, most classification Various semi-supervised learning methods have algorithms tend to predict test samples as the majority class been proposed recently to solve the longstanding and may ignore the minority class. Although many methods, shortage problem of manually labeled data in sentiment such as re-sampling [Chawla et al., 2002], one-class classification classification. However, most existing studies [Juszczak and Duin, 2003], and cost-sensitive assume the balance between negative and positive learning [Zhou and Liu, 2006], have been proposed to solve samples in both the labeled and unlabeled data, this issue, it is still unclear as to which method is more which may not be true in reality. In this paper, we suitable to handle the imbalanced problem in sentiment investigate a more common case of semi-supervised classification and whether the method is extendable to learning for imbalanced sentiment classification.


Incorporating Reviewer and Product Information for Review Rating Prediction

AAAI Conferences

We call this task the rating-inference task; Traditional sentiment analysis mainly considers It determines an author's polarity evaluation within a multipoint binary classifications of reviews, but in many scale (e.g. one to five "stars"). We explore solutions for real-world sentiment classification problems, nonbinary this task in the context of product or service reviews, which review ratings are more useful. This is especially are one of the most important opinion resources and widely true when consumers wish to compare two used by costumers and companies. We observe that in many products, both of which are not negative. Previous real-world scenarios, it is important to provide numerical ratings work has addressed this problem by extracting rather than binary decisions, especially when a customer various features from the review text for learning a compares several candidate products, all of them are positive predictor. Since the same word may have different in a binary classification, to make a purchase decision, since sentiment effects when used by different reviewers customers not only need to know whether a product is good or on different products, we argue that it is necessary not, but also how good the product is. A recent study pointed to model such reviewer and product dependent effects out that many consumers are willing to pay at least 20% percent in order to predict review ratings more accurately.


Open Information Extraction: The Second Generation

AAAI Conferences

How do we scale information extraction to the massive size and unprecedented heterogeneity of the Web corpus? Beginning in 2003, our KnowItAll project has sought to extract high-quality knowledge from the Web. In 2007, we introduced the Open Information Extraction (Open IE) paradigm which eschews handlabeled training examples, and avoids domain-specific verbs and nouns, to develop unlexicalized, domain-independent extractors that scale to the Web corpus. Open IE systems have extracted billions of assertions as the basis for both common-sense knowledge and novel question-answering systems. This paper describes the second generation of Open IE systems, which rely on a novel model of how relations and their arguments are expressed in English sentences to double precision/recall compared with previous systems such as TEXTRUNNER and WOE.


Modeling Public Mood and Emotion: Twitter Sentiment and Socio-Economic Phenomena

AAAI Conferences

We perform a sentiment analysis of all tweets published on the microblogging platform Twitter in the second half of 2008. We use a psychometric instrument to extract six mood states (tension, depression, anger, vigor, fatigue, confusion) from the aggregated Twitter content and compute a six-dimensional mood vector for each day in the timeline. We compare our results to a record of popular events gathered from media and sources. We find that events in the social, political, cultural and economic sphere do have a significant, immediate and highly specific effect on the various dimensions of public mood. We speculate that large scale analyses of mood can provide a solid platform to model collective emotive trends in terms of their predictive value with regards to existing social as well as economic indicators.


Sentiment Flow Through Hyperlink Networks

AAAI Conferences

How does sentiment flow through hyperlink networks? Earlier work on hyperlink networks has focused on the structure of the network, often modeling posts as nodes in a directed graph in which edges represent hyperlinks. At the same time, sentiment analysis has largely focused on classifying texts in isolation. Here we analyze a large hyperlinked network of mass media and weblog posts to determine how sentiment features of a post affect the sentiment of connected posts and the structure of the network itself. We explore the phenomena of sentiment flow through experiments on a graph containing nearly 8 million nodes and 15 million edges. Our analysis indicates that (1) nodes are strongly influenced by their immediate neighbors, (2) deep cascades lead complex but predictable lives, (3) shallow cascades tend to be objective, and (4) sentiment becomes more polarized as depth increases.