Country
SearchBuddies: Bringing Search Engines into the Conversation
Hecht, Brent (Northwestern University) | Teevan, Jaime (Microsoft Research) | Morris, Meredith Ringel (Microsoft Research) | Liebling, Dan (Microsoft Research)
Although people receive trusted, personalized recommendations and auxiliary social benefits when they ask questions of their friends, using a search engine is often a more effective way to find an answer. Attempts to integrate social and algorithmic search have thus far focused on bringing social content into algorithmic search results. However, more of the benefits of social search can be preserved by reversing this approach and bringing algorithmic content into natural question-based conversations. To do this successfully, it is necessary to adapt search engine interaction to a social context. In this paper, we present SearchBuddies, a system that responds to Facebook status message questions with algorithmic search results. Via a three-month deployment of the system to 122 social network users, we explore how people responded to search content in a highly social environment. Our experience deploying SearchBuddies shows that a socially embedded search engine can successfully provide users with unique and highly relevant information in a social context and can be integrated into conversations around an information need. The deployment also illuminates specific challenges of embedding a search engine in a social environment and provides guidance toward solutions.
Crossing Media Streams with Sentiment: Domain Adaptation in Blogs, Reviews and Twitter
Mejova, Yelena (The University of Iowa) | Srinivasan, Padmini (The University of Iowa)
Most sentiment analysis studies address classification of a single source of data such as reviews or blog posts. However, the multitude of social media sources available for text analysis lends itself naturally to domain adaptation. In this study, we create a dataset spanning three social media sources -- blogs, reviews, and Twitter -- and a set of 37 common topics. We first examine sentiments expressed in these three sources while controlling for the change in topic. Then using this multi-dimensional data we show that when classifying documents in one source (a target source), models trained on other sources of data can be as good as or even better than those trained on the target data. That is, we show that models trained on some social media sources are generalizable to others. All source adaptation models we implement show reviews and Twitter to be the best sources of training data. It is especially useful to know that models trained on Twitter data are generalizable, since, unlike reviews, Twitter is more topically diverse.
Network Sampling Designs for Relational Classification
Ahmed, Nesreen K. (Purdue University) | Neville, Jennifer (Purdue University) | Kompella, Ramana (Purdue University)
Relational classification has been extensively studied recently due to its applications in social, biological, technological, and information networks. Much of the work in relational learning has focused on analyzing input data that comprise a single network. Although machine learning researchers have considered the issue of how to sample training and test sets from the input network (for evaluation), the mechanisms which are used to construct the input networks have largely been ignored. In most cases, the input network has itself been sampled from a larger target network (e.g., Facebook) and often the researcher is unaware of how the input network was constructed or what impact that may have on evaluation of the relational models. Since the goal in evaluating relational classification algorithms is to accurately assess their performance on the larger target network, it is critical to understand what impact the initial sampling method may have on our estimates of classification accuracy.In this paper, we present different sampling methods and systematically study their impact on evaluation of relational classification. Our results indicate that the choice of sampling method can impact classification performance, and thus consequently affects the accuracy of evaluation.
Talk of the City: Our Tweets, Our Community Happiness
Quercia, Daniele (University of Cambridge) | Seaghdha, Diarmuid O (University of Cambridge) | Crowcroft, Jon (University of Cambridge)
The literature of urban sociology and that of psychology have separately established two relationships: the first has linked characteristics of a community to its residentsโ well-being, the second has linked well-being of individuals to their use of words. No one has hitherto explored the potential transitive relationship - that between characteristics of a community and its residents' use of words. We test this relationship by performing three steps. We consider Twitter users in a variety of London census communities; extract the subject matter of their tweets using "topic models"; and study the relationship between topics and community socio-economic well-being. We find that certain topics are correlated (positively and negatively) with community deprivation. Users in more deprived community tweet about wedding parties, matters expressed in Spanish/Portuguese, and celebrity gossips. By contrast, those in less deprived communities tweet about vacations, professional use of social media, environmental issues, sports, and health issues. We finally show that monitoring the subject matter of tweets not only offers insights into community well-being, but it is also a reasonable way of predicting community deprivation scores.
Do You Feel What I Feel? Social Aspects of Emotions in Twitter Conversations
Kim, Suin (KAIST) | Bak, JinYeong (KAIST) | Oh, Alice Haeyun (KAIST)
We present a computational framework for understanding the social aspects of emotions in Twitter conversations. Using unannotated data and semisupervised machine learning, we look at emotional transitions, emotional influences among the conversation partners, and patterns in the overall emotional exchanges. We find that conversational partners usually express the same emotion, which we name Emotion accommodation, but when they do not, one of the conversational partners tends to respond with a positive emotion. We also show that tweets containing sympathy, apology, and complaint are significant emotion influencers. We verify the emotion classification part of our framework by a human-annotated corpus.
Coping with the Document Frequency Bias in Sentiment Classification
Rafrafi, Abdelhalim (University Pierre et Marie Curie) | Guigue, Vincent (University Pierre et Marie Curie) | Gallinari, Patrick (University Pierre et Marie Curie)
In this article, we study the polarity detection problem using linear supervised classifiers. We show the interest of penalizing the document frequencies in the regularization process to increase the accuracy. We propose a systematic comparison of different loss and regularization functions on this particular task using the Amazon dataset. Then, we evaluate our models according to three criteria: accuracy, sparsity and subjectivity. The subjectivity is measured by projecting our dictionary and optimized weight vector on the SentiWordNet lexicon. This original approach highlights a bias in the selection of the relevant terms during the regularization procedure: frequent terms are overweighted compared to their intrinsic subjectivities.We show that this bias appears whatever the chosen loss or regularization and on all datasets: it is closely link to the gradient descent technique. Penalizing the document frequency during the learning step enables us to improve significantly our performances. A lot of sentimental markers appear rarely and thus, are unappreciated by statistical learning algorithms. Explicitly boosting their influences leads to increasing the accuracy in the sentiment classification task.
Where Is This Tweet From? Inferring Home Locations of Twitter Users
Mahmud, Jalal (IBM Research - Almaden) | Nichols, Jeffrey (IBM Research - Almaden) | Drews, Clemens (IBM Research - Almaden)
We present a new algorithm for inferring the home locations of Twitter users at different granularities, such as city, state, or time zone, using the content of their tweets and their tweeting behavior. Unlike existing approaches, our algorithm uses an ensemble of statistical and heuristic classifiers to predict locations. We find that a hierarchical classification approach can improve prediction accuracy. Experimental evidence suggests that our algorithm works well in practice and outperforms the best existing algorithms for predicting the location of Twitter users.
Computational Predictors in Online Social Deliberations
Woolf, Beverly Park (University of Massachusetts-Amherst) | Murray, Thomas (University of Massachusetts-Amherst) | Xu, Xiaoxi (University of Massachusetts-Amherst) | Osterweil, Leon (University of Massachusetts-Amherst) | Clarke, Lori (University of Massachusetts-Amherst) | Wing, Leah (University of Massachusetts-Amherst) | Katsh, Ethan (University of Massachusetts-Amherst)
This research seeks to identify online participants' disposi tion and skills. A prototype dashboard and annotation scheme were developed to support facilitators and several computational predictors were identified that show statisti cally significant correlations with dialogue skills as ob served by human annotators.
Where Online Friends Meet: Social Communities in Location-Based Networks
Brown, Chloรซ (University of Cambridge) | Nicosia, Vincenzo (University of Cambridge) | Scellato, Salvatore (University of Cambridge) | Noulas, Anastasios (University of Cambridge) | Mascolo, Cecilia (University of Cambridge)
Recent research suggests that, as in offline scenarios, spatial proximity increases the likelihood that two individuals establish an online social connection, and geographic closeness could therefore influence the formation of online communities. In this work we present a study of communities in two online social networks with location-sharing features and analyze their social and spatial properties. We study the places users visit to understand whether communities revolve around places or whether they exist independently. Our results suggest that community structure in social networks may arise from both social and spatial factors, so that exploiting information about the places where people go could benefit the definition of new community detection methods and community evolution models.
Learning the Nature of Information in Social Networks
Agrawal, Rakesh (Microsoft) | Potamias, Michalis (Groupon) | Terzi, Evimaria (Boston University)
We postulate that the nature of information items plays a vital role in the observed spread of these items in a social network. We capture this intuition by proposing a model that assigns to every information item two parameters: endogeneity and exogeneity. The endogeneity of the item quantifies its tendency to spread primarily through the connections between nodes; the exogeneity quantifies its tendency to be acquired by the nodes, independently of the underlying network. We also extend this item-based model to take into account the openness of each node to new information. We quantify openness by introducing the receptivity of a node. Given a social network and data related to the ordering of adoption of information items by nodes, we develop a maximum-likelihood framework for estimating endogeneity, exogeneity and receptivity parameters. We apply our methodology to synthetic and real data and demonstrate its efficacy as a data-analytic tool.