Goto

Collaborating Authors

 Discourse & Dialogue


Learning to Identify Review Spam

AAAI Conferences

In the past few years, sentiment analysis and opinion mining becomes a popular and important task. These studies all assume that their opinion resources are real and trustful. However, they may encounter the faked opinion or opinion spam problem. In this paper, we study this issue in the context of our product review mining system. On product review site, people may write faked reviews, called review spam, to promote their products, or defame their competitors' products. It is important to identify and filter out the review spam. Previous work only focuses on some heuristic rules, such as helpfulness voting, or rating deviation, which limits the performance of this task. In this paper, we exploit machine learning methods to identify review spam. Toward the end, we manually build a spam collection from our crawled reviews. We first analyze the effect of various features in spam identification. We also observe that the review spammer consistently writes spam. This provides us another view to identify review spam: we can identify if the author of the review is spammer. Based on this observation, we provide a two-view semi-supervised method, co-training, to exploit the large amount of unlabeled data. The experiment results show that our proposed method is effective. Our designed machine learning methods achieve significant improvements in comparison to the heuristic baselines.


Context Sensitive Topic Models for Author Influence in Document Networks

AAAI Conferences

In a document network such as a citation network of scientific documents, web-logs etc., the content produced by authors exhibit their interest in certain topics. In addition some authors influence other authors' interests. In this work, we propose to model the influence of cited authors along with the interests of citing authors. Morover , we hypothesize that citations present in documents, the context surrounding the citation mention provides extra topical information about the cited authors. However, associating terms in the context to the cited authors remains an open problem. We propose novel document generation schemes that incorporate the context while simultaneously modeling the interests of citing authors and influence of the cited authors. Our experiments show significant improvements over baseline models for various evaluation criteria such as link prediction between document and cited author, and quantitatively explaining unseen text.


Interfacing Virtual Agents With Collaborative Knowledge: Open Domain Question Answering Using Wikipedia-Based Topic Models

AAAI Conferences

This paper is concerned with the use of conversational agents as an interaction paradigm for accessing open domain encyclopedic knowledge by means of Wikipedia. More precisely, we describe a dialogue-based question answering system for German which utilizes Wikipedia-based topic models as a reference point for context detection and answer prediction. We investigate two different per- spectives to the task of interfacing virtual agents with collaborative knowledge. First, we exploit the use of Wikipedia categories as a basis for identifying the broader topic of a spoken utterance. Second, we describe how to enhance the conversational behavior of the virtual agent by means of a Wikipedia-based question answering component which incorporates the question topic. At large, our approach identifies topic-related focus terms of a userโ€™s question, which are subsequently mapped onto a category taxonomy. Thus, we utilize the taxonomy as a reference point to derive topic labels for a userโ€™s question. The employed topic model is thereby based on explicitly given concepts as represented by the document and category structure of the Wikipedia knowledge base. Identified topic categories are subsequently combined with different linguistic filtering methods to improve answer candidate retrieval and reranking. Results show that the topic model approach contributes to an enhancement of the conversational behavior of virtual agents.


Improving Topic Evaluation Using Conceptual Knowledge

AAAI Conferences

The growing number of statistical topic models led to the need to better evaluate their output. Traditional evaluation means estimate the modelโ€™s fitness to unseen data. It has recently been proven than the output of human judgment can greatly differ from these measures. Thus the need for methods that better emulate human judgment is stringent. In this paper we present a system that computes the usefulness of individual topics from a given model on the basis of information drawn from a given ontology, in this case WordNet. The notion of utility is regarded as the ability to attribute a concept to each topic and separate words related to the topic from the unrelated ones based on that concept. In multiple experiments we prove the correlation between the automatic evaluation method and the answers received from human evaluators, for various corpora and difficulty levels. By changing the evaluation focus from a statistical one to a conceptual one we were able to detect which topics are conceptually meaningful and rank them accordingly.


Semi-Supervised Learning for Imbalanced Sentiment Classification

AAAI Conferences

Trained on the imbalanced labeled data, most classification Various semi-supervised learning methods have algorithms tend to predict test samples as the majority class been proposed recently to solve the longstanding and may ignore the minority class. Although many methods, shortage problem of manually labeled data in sentiment such as re-sampling [Chawla et al., 2002], one-class classification classification. However, most existing studies [Juszczak and Duin, 2003], and cost-sensitive assume the balance between negative and positive learning [Zhou and Liu, 2006], have been proposed to solve samples in both the labeled and unlabeled data, this issue, it is still unclear as to which method is more which may not be true in reality. In this paper, we suitable to handle the imbalanced problem in sentiment investigate a more common case of semi-supervised classification and whether the method is extendable to learning for imbalanced sentiment classification.


Incorporating Reviewer and Product Information for Review Rating Prediction

AAAI Conferences

We call this task the rating-inference task; Traditional sentiment analysis mainly considers It determines an author's polarity evaluation within a multipoint binary classifications of reviews, but in many scale (e.g. one to five "stars"). We explore solutions for real-world sentiment classification problems, nonbinary this task in the context of product or service reviews, which review ratings are more useful. This is especially are one of the most important opinion resources and widely true when consumers wish to compare two used by costumers and companies. We observe that in many products, both of which are not negative. Previous real-world scenarios, it is important to provide numerical ratings work has addressed this problem by extracting rather than binary decisions, especially when a customer various features from the review text for learning a compares several candidate products, all of them are positive predictor. Since the same word may have different in a binary classification, to make a purchase decision, since sentiment effects when used by different reviewers customers not only need to know whether a product is good or on different products, we argue that it is necessary not, but also how good the product is. A recent study pointed to model such reviewer and product dependent effects out that many consumers are willing to pay at least 20% percent in order to predict review ratings more accurately.


Improving Performance of Topic Models by Variable Grouping

AAAI Conferences

Topic models have a wide range of applications, including modeling of text documents, images, user preferences, product rankings, and many others. However, learning optimal models may be difficult, especially for large problems. The reason is that inference techniques such as Gibbs sampling often converge to suboptimal models due to the abundance of local minima in large datasets. In this paper, we propose a general method of improving the performance of topic models. The method, called 'grouping transform', works by introducing auxiliary variables which represent assignments of the original model tokens to groups. Using these auxiliary variables, it becomes possible to resample an entire group of tokens at a time. This allows the sampler to make larger state space moves. As a result, better models are learned and performance is improved. The proposed ideas are illustrated on several topic models and several text and image datasets. We show that the grouping transform significantly improves performance over standard models.


LeadLag LDA: Estimating Topic Specific Leads and Lags of Information Outlets

AAAI Conferences

Identifying which outlet in social media leads the rest in disseminating novel information on specific topics is an interesting challenge for information analysts and social scientists. In this work, we hypothesize that novel ideas are disseminated through the creation and propagation of new or newly emphasized key words, and therefore lead/lag of outlets can be estimated by tracking word usage across these outlets. First, we demonstrate the validaty of our hypothesis by showing that a simple TF-IDF based nearest-neighbors approach can recover generally accepted lead/lag behavior on the outlets pair of ACM journal articles and conference papers. Next, we build a new topic model called LeadLag LDA that estimates the lead/lag of the outlets on specific topics. We validate the topic model using the lead/lag results from the TF-IDF nearest neighbors approach. Finally, we present results from our model on two different outlet pairs of blogs vs. news media and grant proposals vs. research publications that reveal interesting patterns.


Modeling Public Mood and Emotion: Twitter Sentiment and Socio-Economic Phenomena

AAAI Conferences

We perform a sentiment analysis of all tweets published on the microblogging platform Twitter in the second half of 2008. We use a psychometric instrument to extract six mood states (tension, depression, anger, vigor, fatigue, confusion) from the aggregated Twitter content and compute a six-dimensional mood vector for each day in the timeline. We compare our results to a record of popular events gathered from media and sources. We find that events in the social, political, cultural and economic sphere do have a significant, immediate and highly specific effect on the various dimensions of public mood. We speculate that large scale analyses of mood can provide a solid platform to model collective emotive trends in terms of their predictive value with regards to existing social as well as economic indicators.


Sentiment Flow Through Hyperlink Networks

AAAI Conferences

How does sentiment flow through hyperlink networks? Earlier work on hyperlink networks has focused on the structure of the network, often modeling posts as nodes in a directed graph in which edges represent hyperlinks. At the same time, sentiment analysis has largely focused on classifying texts in isolation. Here we analyze a large hyperlinked network of mass media and weblog posts to determine how sentiment features of a post affect the sentiment of connected posts and the structure of the network itself. We explore the phenomena of sentiment flow through experiments on a graph containing nearly 8 million nodes and 15 million edges. Our analysis indicates that (1) nodes are strongly influenced by their immediate neighbors, (2) deep cascades lead complex but predictable lives, (3) shallow cascades tend to be objective, and (4) sentiment becomes more polarized as depth increases.