Goto

Collaborating Authors

 Discourse & Dialogue


Toward Spoken Dialogue as Mutual Agreement

AAAI Conferences

The social and collaborative nature of dialogue challenges A spoken dialogue system (SDS) has a social role: it supposedly an SDS in many ways. The spontaneity of dialogue gives allows people to communicate with a computer in rise to disfluencies, where a person repeats or interrupts ordinary language. A robust SDS should support coherent herself, produces filled pauses or false starts and selfrepairs. Disfluencies play a fundamental role in dialogue, and habitable dialogue, even when it confronts situations as signals for turn-taking (Gravano, 2009; Sacks, Schegloff for which it has no explicit pre-specified behavior. To ensure robust task completion, however, SDS designers typically and Jefferson, 1974) and for grounding to establish shared produce systems that make a sequence of rigid demands beliefs about the current state of mutual understanding on the user, and thereby lose any semblance of natural (Clark and Schaefer, 1989). Most SDSs handle the content dialogue. The thesis of our work is that a dialogue of the user's utterances, but do not fully integrate the way they address utterance meaning, disfluencies, turn-taking should evolve as a set of agreements that arise from joint and the collaborative nature of grounding.


Characterizing Microblogs with Topic Models

AAAI Conferences

As microblogging grows in popularity, services like Twitter are coming to support information gathering needs above and beyond their traditional roles as social networks. But most usersโ€™ interaction with Twitter is still primarily focused on their social graphs, forcing the often inappropriate conflation of โ€œpeople I followโ€ with โ€œstuff I want to read.โ€ We characterize some information needs that the current Twitter interface fails to support, and argue for better representations of content for solving these challenges. We present a scalable implementation of a partially supervised learning model (Labeled LDA) that maps the content of the Twitter feed into dimensions. These dimensions correspond roughly to substance, style, status, and social characteristics of posts. We characterize users and tweets using this model, and present results on two information consumption oriented tasks.


From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series

AAAI Conferences

We connect measures of public opinion measured from polls with sentiment measured from text. We analyze several surveys on consumer con๏ฌdence and political opinion over the 2008 to 2009 period, and ๏ฌnd they correlate to sentiment word frequencies in contempora- neous Twitter messages. While our results vary across datasets, in several cases the correlations are as high as 80%, and capture important large-scale trends. The re- sults highlight the potential of text streams as a substi- tute and supplement for traditional polling. consumer con๏ฌdence and political opinion, and can also pre- dict future movements in the polls. We ๏ฌnd that temporal smoothing is a critically important issue to support a suc- cessful model.


ICWSM โ€” A Great Catchy Name: Semi-Supervised Recognition of Sarcastic Sentences in Online Product Reviews

AAAI Conferences

Sarcasm is a sophisticated form of speech act widely used in online communities. Automatic recognition of sarcasm is, however, a novel task. Sarcasm recognition could contribute to the performance of review summarization and ranking systems. This paper presents SASI, a novel Semi-supervised Algorithm for Sarcasm Identification that recognizes sarcastic sentences in product reviews. SASI has two stages: semi-supervised pattern acquisition, and sarcasm classification. We experimented on a data set of about 66000 Amazon reviews for various books and products. Using a gold standard in which each sentence was tagged by 3 annotators, we obtained precision of 77% and recall of 83.1% for identifying sarcastic sentences. We found some strong features that characterize sarcastic utterances. However, a combination of more subtle pattern-based features proved more promising in identifying the various facets of sarcasm. We also speculate on the motivation for using sarcasm in online communities and social networks.


Classifier Calibration for Multi-Domain Sentiment Classification

AAAI Conferences

Textual sentiment classifiers classify texts into a fixed number of affective classes, such as positive, negative or neutral sentiment, or subjective versus objective information. It has been observed that sentiment classifiers suffer from a lack of generalization capability: a classifier trained on a certain domain generally performs worse on data from another domain. This phenomenon has been attributed to domain-specific affective vocabulary. In this paper, we propose a voting-based thresholding approach, which calibrates a number of existing single-domain classifiers with respect to sentiment data from a new domain. The approach presupposes only a small amount of annotated data from the new domain. We evaluate three criteria for estimating thresholds, and discuss the ramifications of these criteria for the trade-off between classifier performance and manual annotation effort.


Generating Domain-Specific Clues Using News Corpus for Sentiment Classification

AAAI Conferences

This paper addresses the problem of automatically generating domain-specific sentiment clues. The main idea is to bootstrap from a small seed set and generate new clues by using dependencies and collocation information between sentiment clues and sentence-level topics that would be a primary subject of sentiment expression (e.g., event, company, and person). The experiments show that the aggregated clues are effective for sentiment classification.


The Wisdom of Bookies? Sentiment Analysis Versus. the NFL Point Spread

AAAI Conferences

The American Football betting market provides a particularly attractive domain to study the nexus between public sentiment and the wisdom of crowds. In this paper, we present the first substantial study of the relationship between the NFL betting line and public opinion expressed in blogs and microblogs (Twitter). We perform a large-scale study of four distinct text streams: LiveJournal blogs, RSS blog feeds captured by Spinn3r, Twitter, and traditional news media. Our results show interesting disparities between the first and second halves of each season. We present evidence showing usefulness of sentiment on NFL betting. We demonstrate that a strategy betting roughly 30 games per year identified winner roughly 60% of the time from 2006 to 2009, well beyond what is needed to overcome the bookie's typical commission(53%).


โ€œHow Incredibly Awesome!โ€ โ€” Click Here to Read More

AAAI Conferences

We investigate the impact of a discussion snippet's overall sentiment on a user's willingness to read more of a discussion. Using sentiment analysis, we constructed positive, neutral, and negative discussion snippets using the discussion topic and a sample comment from discussions taking place around content on an enterprise social networking site. We computed personalized snippet recommendations for a subset of users and conducted a survey to test how these recommendations were perceived. Our experimental results show that snippets with high sentiments are better discussion "teasers."


Supervised Topic Models

arXiv.org Machine Learning

We introduce supervised latent Dirichlet allocation (sLDA), a statistical model of labelled documents. The model accommodates a variety of response types. We derive an approximate maximum-likelihood procedure for parameter estimation, which relies on variational methods to handle intractable posterior expectations. Prediction problems motivate this research: we use the fitted model to predict response values for new documents. We test sLDA on two real-world problems: movie ratings predicted from reviews, and the political tone of amendments in the U.S. Senate based on the amendment text. We illustrate the benefits of sLDA versus modern regularized regression, as well as versus an unsupervised LDA analysis followed by a separate regression.


Syntactic Topic Models

arXiv.org Artificial Intelligence

The syntactic topic model (STM) is a Bayesian nonparametric model of language that discovers latent distributions of words (topics) that are both semantically and syntactically coherent. The STM models dependency parsed corpora where sentences are grouped into documents. It assumes that each word is drawn from a latent topic chosen by combining document-level features and the local syntactic context. Each document has a distribution over latent topics, as in topic models, which provides the semantic consistency. Each element in the dependency parse tree also has a distribution over the topics of its children, as in latent-state syntax models, which provides the syntactic consistency. These distributions are convolved so that the topic of each word is likely under both its document and syntactic context. We derive a fast posterior inference algorithm based on variational methods. We report qualitative and quantitative studies on both synthetic data and hand-parsed documents. We show that the STM is a more predictive model of language than current models based only on syntax or only on topics.