Goto

Collaborating Authors

 Genre


Evolution of Experts in Question Answering Communities

AAAI Conferences

Community Question Answering (CQA) services thrive as a result of a small number of highly active users, typically called experts, who provide a large number of high quality useful answers. Understanding the temporal dynamics and interactions between experts can present key insights into how community members evolve over time. In this paper, we present a temporal study of experts in CQA and analyze the changes in their behavioral patterns over time. Further, using unsupervised machine learning methods, we show the interesting evolution patterns that can help us distinguish experts from one another. Using supervised classification methods, we show that the models based on evolutionary data of users can be more effective at expert identification than the models that ignore evolution. We run our experiments on two large online CQA to show the generality of our proposed approach.


Crossing Media Streams with Sentiment: Domain Adaptation in Blogs, Reviews and Twitter

AAAI Conferences

Most sentiment analysis studies address classification of a single source of data such as reviews or blog posts. However, the multitude of social media sources available for text analysis lends itself naturally to domain adaptation. In this study, we create a dataset spanning three social media sources -- blogs, reviews, and Twitter -- and a set of 37 common topics. We first examine sentiments expressed in these three sources while controlling for the change in topic. Then using this multi-dimensional data we show that when classifying documents in one source (a target source), models trained on other sources of data can be as good as or even better than those trained on the target data. That is, we show that models trained on some social media sources are generalizable to others. All source adaptation models we implement show reviews and Twitter to be the best sources of training data. It is especially useful to know that models trained on Twitter data are generalizable, since, unlike reviews, Twitter is more topically diverse.


Around the Water Cooler: Shared Discussion Topics and Contact Closeness in Social Search

AAAI Conferences

Search engines are now augmenting search results with social annotations, i.e., endorsements from users’ social network contacts. However, there is currently a dearth of published research on the effects of these annotations on user choice. This work investigates two research questions associated with annotations: 1) do some contacts affect user choice more than others, and 2) are annotations relevant across various information needs. We conduct a controlled experiment with 355 participants, using hypothetical searches and annotations, and elicit users’ choices. We find that domain contacts are preferred to close contacts, and this preference persists across a variety of information needs. Further, these contacts need not be experts and might be identified easily from conversation data.


OMG, I Have to Tweet that! A Study of Factors that Influence Tweet Rates

AAAI Conferences

Many studies have shown that social data such as tweets are a rich source of information about the real-world including, for example, insights into health trends. A key limitation when analyzing Twitter data, however, is that it depends on people self-reporting their own behaviors and observations. In this paper, we present a large-scale quantitative analysis of some of the factors that influence self-reporting bias. In our study, we compare a year of tweets about weather events to ground-truth knowledge about actual weather occurrences. For each weather event we calculate how extreme, how expected, and how big a change the event represents. We calculate the extent to which these factors can explain the daily variations in tweet rates about weather events. We find that we can build global models that take into account basic weather information, together with extremeness, expectation and change calculations to account for over 40% of the variability in tweet rates. We build location-specific (i.e., a model per each metropolitan area) models that account for an average of 70% of the variability in tweet rates.


Defense Mechanism or Socialization Tactic? Improving Wikipedia’s Notifications to Rejected Contributors

AAAI Conferences

Unlike traditional firms, open collaborative systems rely on volunteers to operate, and many communities struggle to maintain enough contributors to ensure the quality and quantity of content. However, Wikipedia has historically faced the exact opposite problem: too much participation, particularly from users who, knowingly or not, do not share the same norms as veteran Wikipedians. During its period of exponential growth, the Wikipedian community developed specialized socio-technical defense mechanisms to protect itself from the negatives of massive participation: spam, vandalism, falsehoods, and other damage. Yet recently, Wikipedia has faced a number of high-profile issues with recruiting and retaining new contributors. In this paper, we first illustrate and describe the various defense mechanisms at work in Wikipedia, which we hypothesize are inhibiting newcomer retention. Next, we present results from an experiment aimed at increasing both the quantity and quality of editors by altering various elements of these defense mechanisms, specifically pre-scripted warnings and notifications that are sent to new editors upon reverting or rejecting contributions. Using logistic regressions to model new user activity, we show which tactics work best for different populations of users based on their motivations when joining Wikipedia. In particular, we found that personalized messages in which Wikipedians identified themselves in active voice and took direct responsibility for rejecting an editor’s contributions were much more successful across a variety of outcome metrics than the current messages, which typically use an institutional and passive voice.


Exploring Social-Historical Ties on Location-Based Social Networks

AAAI Conferences

Location-based social networks (LBSNs) have become a popular form of social media in recent years. They provide location related services that allow users to "check-in'' at geographical locations and share such experiences with their friends. Millions of "check-in'' records in LBSNs contain rich information of social and geographical context and provide a unique opportunity for researchers to study user's social behavior from a spatial-temporal aspect, which in turn enables a variety of services including place advertisement, traffic forecasting, and disaster relief. In this paper, we propose a social-historical model to explore user's check-in behavior on LBSNs. Our model integrates the social and historical effects and assesses the role of social correlation in user's check-in behavior. In particular, our model captures the property of user's check-in history in forms of power-law distribution and short-term effect, and helps in explaining user's check-in behavior. The experimental results on a real world LBSN demonstrate that our approach properly models user's check-ins and shows how social and historical ties can help location prediction.


Distributional Footprints of Deceptive Product Reviews

AAAI Conferences

This paper postulates that there are natural distributions of opinions in product reviews. In particular, we hypothesize that for a given domain, there is a set of representative distributions of review rating scores. A deceptive business entity that hires people to write fake reviews will necessarily distort its distribution of review scores, leaving distributional footprints behind. In order to validate this hypothesis, we introduce strategies to create dataset with pseudo-gold standard that is labeled automatically based on different types of distributional footprints. A range of experiments confirm the hypothesized connection between the distributional anomaly and deceptive reviews. This study also provides novel quantitative insights into the characteristics of natural distributions of opinions in the TripAdvisor hotel review and the Amazon product review domains.


You Too?! Mixed-Initiative LDA Story Matching to Help Teens in Distress

AAAI Conferences

Adolescent cyber-bullying on social networks is a phenomenon that has received widespread attention. Recent work by sociologists has examined this phenomenon under the larger context of teenage drama and it's manifestations on social networks. Tackling cyber-bullying involves two key components – automatic detection of possible cases, and interaction strategies that encourage reflection and emotional support. Key is showing distressed teenagers that they are not alone in their plight. Conventional topic spotting and document classification into labels like "dating" or "sports" are not enough to effectively match stories for this task. In this work, we examine a corpus of 5500 stories from distressed teenagers from a major youth social network. We combine Latent Dirichlet Allocation and human interpretation of its output using principles from sociolinguistics to extract high-level themes in the stories and use them to match new stories to similar ones. A user evaluation of the story matching shows that theme-based retrieval does a better job of finding relevant and effective stories for this application than conventional approaches.


Not All Moods Are Created Equal! Exploring Human Emotional States in Social Media

AAAI Conferences

Emotional states of individuals, also known as moods, are central to the expression of thoughts, ideas and opinions, and in turn impact attitudes and behavior. As social media tools are increasingly used by individuals to broadcast their day-to-day happenings, or to report on an external event of interest, understanding the rich ‘landscape’ of moods will help us better interpret and make sense of the behavior of millions of individuals. Motivated by literature in psychology, we study a popular representation of human mood landscape, known as the ‘circumplex model’ that characterizes affective experience through two dimensions: valence and activation. We identify more than 200 moods frequent on Twitter, through mechanical turk studies and psychology literature sources, and report on four aspects of mood expression: the relationship between (1) moods and usage levels, including linguistic diversity of shared content (2) moods and the social ties individuals form, (3) moods and amount of network activity of individuals, and (4) moods and participatory patterns of individuals such as link sharing and conversational engagement. Our results provide at-scale naturalistic assessments and extensions of existing conceptualizations of human mood in social media contexts.


The Livehoods Project: Utilizing Social Media to Understand the Dynamics of a City

AAAI Conferences

Studying the social dynamics of a city on a large scale has tra- ditionally been a challenging endeavor, requiring long hours of observation and interviews, usually resulting in only a par- tial depiction of reality. At the same time, the boundaries of municipal organizational units, such as neighborhoods and districts, are largely statically defined by the city government and do not always reflect the character of life in these ar- eas. To address both difficulties, we introduce a clustering model and research methodology for studying the structure and composition of a city based on the social media its res- idents generate. We use data from approximately 18 million check-ins collected from users of a location-based online so- cial network. The resulting clusters, which we call Livehoods, are representations of the dynamic urban areas that comprise the city. We take an interdisciplinary approach to validating these clusters, interviewing 27 residents of Pittsburgh, PA, to see how their perceptions of the city project onto our findings there. Our results provide strong support for the discovered clusters, showing how Livehoods reveal the distinctly charac- terized areas of the city and the forces that shape them.