Goto

Collaborating Authors

 Discourse & Dialogue


Topic Models to Infer Socio-Economic Maps

AAAI Conferences

Socio-economic maps contain important information regarding the population of a country. Computing these maps is critical given that policy makers often times make important decisions based upon such information. However, the compilation of socio-economic maps requires extensive resources and becomes highly expensive. On the other hand, the ubiquitous presence of cell phones, is generating large amounts of spatiotemporal data that can reveal human behavioral traits related to specific socio-economic characteristics. Traditional inference approaches have taken advantage of these datasets to infer regional socio-economic characteristics. In this paper, we propose a novel approach whereby topic models are used to infer socio-economic levels from large-scale spatio-temporal data. Instead of using a pre-determined set of features, we use latent Dirichlet Allocation (LDA) to extract latent recurring patterns of co-occurring behaviors across regions, which are then used in the prediction of socio-economic levels. We show that our approach improves state of the art prediction results by 9%.


Personalized Microblog Sentiment Classification via Multi-Task Learning

AAAI Conferences

Microblog sentiment classification is an interesting and important research topic with wide applications. Traditional microblog sentiment classification methods usually use a single model to classify the messages from different users and omit individuality. However, microblogging users frequently embed their personal character, opinion bias and language habits into their messages, and the same word may convey different sentiments in messages posted by different users. In this paper, we propose a personalized approach for microblog sentiment classification. In our approach, each user has a personalized sentiment classifier, which is decomposed into two components, a global one and a user-specific one. Our approach can capture the individual personality and at the same time leverage the common sentiment knowledge shared by all users. The personalized sentiment classifiers of massive users are trained in a collaborative way based on multi-task learning to handle the data sparseness problem. In addition, we incorporate users' social relations into our model to strengthen the learning of the personalized models. Moreover, we propose a distributed optimization algorithm to solve our model in parallel. Experiments on two real-world microblog sentiment datasets validate that our approach can improve microblog sentiment classification accuracy effectively and efficiently.


Improving Twitter Sentiment Classification Using Topic-Enriched Multi-Prototype Word Embeddings

AAAI Conferences

It has been shown that learning distributed word representations is highly useful for Twitter sentiment classification.Most existing models rely on a single distributed representation for each word.This is problematic for sentiment classification because words are often polysemous and each word can contain different sentiment polarities under different topics.We address this issue by learning topic-enriched multi-prototype word embeddings (TMWE).In particular, we develop two neural networks which 1) learn word embeddings that better capture tweet context by incorporating topic information, and 2) learn topic-enriched multiple prototype embeddings for each word.Experiments on Twitter sentiment benchmark datasets in SemEval 2013 show that TMWE outperforms the top system with hand-crafted features, and the current best neural network model.


Microsummarization of Online Reviews: An Experimental Study

AAAI Conferences

Mobile and location-based social media applications provide platforms for users to share brief opinions about products, venues, and services. These quickly typed opinions, or microreviews, are a valuable source of current sentiment on a wide variety of subjects. However, there is currently little research on how to mine this information to present it back to users in easily consumable way. In this paper, we introduce the task of microsummarization, which combines sentiment analysis, summarization, and entity recognition in order to surface key content to users. We explore unsupervised and supervised methods for this task, and find we can reliably extract relevant entities and the sentiment targeted towards them using crowdsourced labels as supervision. In an end-to-end evaluation, we find our best-performing system is vastly preferred by judges over a traditional extractive summarization approach. This work motivates an entirely new approach to summarization, incorporating both sentiment analysis and item extraction for modernized, at-a-glance presentation of public opinion.


Topical Analysis of Interactions Between News and Social Media

AAAI Conferences

The analysis of interactions between social media and traditional news streams is becoming increasingly relevant for a variety of applications, including: understanding the underlying factors that drive the evolution of data sources, tracking the triggers behind events, and discovering emerging trends.Researchers have explored such interactions by examining volume changes or information diffusions,however, most of them ignore the semantical and topical relationships between news and social media data.Our work is the first attempt to study how news influences social media, or inversely, based on topical knowledge.We propose a hierarchical Bayesian model that jointly models the news and social media topics and their interactions.We show that our proposed model can capture distinct topics for individual datasets as well as discover the topic influences among multiple datasets.By applying our model to large sets of news and tweets, we demonstrate its significant improvement over baseline methods and explore its power in the discovery of interesting patterns for real world cases.


Acquiring Knowledge of Affective Events from Blogs Using Label Propagation

AAAI Conferences

Many common events in our daily life affect us in positive and negative ways. For example, going on vacation is typically an enjoyable event, while being rushed to the hospital is an undesirable event. In narrative stories and personal conversations, recognizing that some events have a strong affective polarity is essential to understand the discourse and the emotional states of the affected people. However, current NLP systems mainly depend on sentiment analysis tools, which fail to recognize many events that are implicitly affective based on human knowledge about the event itself and cultural norms. Our goal is to automatically acquire knowledge of stereotypically positive and negative events from personal blogs. Our research creates an event context graph from a large collection of blog posts and uses a sentiment classifier and semi-supervised label propagation algorithm to discover affective events. We explore several graph configurations that propagate affective polarity across edges using local context, discourse proximity, and event-event co-occurrence. We then harvest highly affective events from the graph and evaluate the agreement of the polarities with human judgements.


Discourse Relations Detection via a Mixed Generative-Discriminative Framework

AAAI Conferences

Word embeddings, which can better capture the fine-grained semantics of words, have proven to be useful for a variety of natural language processing tasks. However, because discourse structures describe the relationships between segments of discourse, word embeddings cannot be directly integrated to perform the task. In this paper, we introduce a mixed generative-discriminative framework, in which we use vector offsets between embeddings of words to represent the semantic relations between text segments and Fisher kernel framework to convert a variable number of vector offsets into a fixed length vector. In order to incorporate the weights of these offsets into the vector, we also propose the Weighted Fisher Vector. Experimental results on two different datasets show that the proposed method without using manually designed features can achieve better performance on recognizing the discourse level relations in most cases.


Jointly Modeling Topics and Intents with Global Order Structure

AAAI Conferences

Modeling document structure is of great importance for discourse analysis and related applications. The goal of this research is to capture the document intent structure by modeling documents as a mixture of topic words and rhetorical words. While the topics are relatively unchanged through one document, the rhetorical functions of sentences usually change following certain orders in discourse. We propose GMM-LDA, a topic modeling based Bayesian unsupervised model, to analyze the document intent structure cooperated with order information. Our model is flexible that has the ability to combine the annotations and do supervised learning. Additionally, entropic regularization can be introduced to model the significant divergence between topics and intents. We perform experiments in both unsupervised and supervised settings, results show the superiority of our model over several state-of-the-art baselines.


Hashtag-Based Sub-Event Discovery Using Mutually Generative LDA in Twitter

AAAI Conferences

Sub-event discovery is an effective method for social event analysis in Twitter. It can discover sub-events from large amount of noisy event-related information in Twitter and semantically represent them. The task is challenging because tweets are short, informal and noisy. To solve this problem, we consider leveraging event-related hashtags that contain many locations, dates and concise sub-event related descriptions to enhance sub-event discovery. To this end, we propose a hashtag-based mutually generative Latent Dirichlet Allocation model(MGe-LDA). In MGe-LDA, hashtags and topics of a tweet are mutually generated by each other. The mutually generative process models the relationship between hashtags and topics of tweets, and highlights the role of hashtags as a semantic representation of the corresponding tweets. Experimental results show that MGe-LDA can significantly outperform state-of-the-art methods for sub-event discovery.


Progressive EM for Latent Tree Models and Hierarchical Topic Detection

AAAI Conferences

Hierarchical latent tree analysis (HLTA) is recently proposed as a new method for topic detection. It differs fundamentally from the LDA-based methods in terms of topic definition, topic-document relationship, and learning method. It has been shown to discover significantly more coherent topics and better topic hierarchies. However, HLTA relies on the Expectation-Maximization (EM) algorithm for parameter estimation and hence is not efficient enough to deal with large datasets. In this paper, we propose a method to drastically speed up HLTA using a technique inspired by the advances in the method of moments. Empirical experiments show that our method greatly improves the efficiency of HLTA. It is as efficient as the state-of-the-art LDA-based method for hierarchical topic detection and finds substantially better topics and topic hierarchies.