Statistical Learning
Sensing Urban Social Geography Using Online Social Networking Data
Phithakkitnukoon, Santi (Massachusetts Institute of Technology)
Growing pool of public-generated bits like online social networking data provides possibility to sense social dynamics in the urban space. In this position paper, we use a location-based online social networking data to sense geo-social activity and analyze the underlying social activity distribution of three different cities: London, Paris, and New York. We find a non-linear distribution of social activity, which follows the Power Law decay function. We perform inter-urban analysis based on social activity distribution and clustering. We believe that our study sheds new light on context-aware urban computing and social sensing.
Exploiting Semantic Annotations for Clustering Geographic Areas and Users in Location-based Social Networks
Noulas, Anastasios (University of Cambridge) | Scellato, Salvatore (University of Cambridge) | Mascolo, Cecilia (University of Cambridge) | Pontil, Massimiliano (University College London)
Location-Based Social Networks (LBSN) present so far the most vivid realization of the convergence of the physical and virtual social planes. In this work we propose a novel approach on modeling human activity and geographical areas by means of place categories. We apply a spectral clustering algorithm on areas and users of two metropolitan cities on a dataset sourced from the most vibrant LBSN, Foursquare. Our methodology allows the identification of user communities that visit similar categories of places and the comparison of urban neighborhoods within and across cities. We demonstrate how semantic information attached to places could be plausibly used as a modeling interface for applications such as recommender systems and digital tourist guides.
Social Mechanics: An Empirically Grounded Science of Social Media
Lerman, Kristina (USC Information Sciences Institute) | Galstyan, Aram (USC Information Sciences Institute) | Steeg, Greg Ver (USC Information Sciences Institute) | Hogg, Tad (Hewlett-Packard)
What will social media sites of tomorrow look like? What behaviors will their interfaces enable? A major challenge for designing new sites that allow a broader range of user actions is the difficulty of extrapolating from experience with current sites without first distinguishing correlations from underlying causal mechanisms. The growing availability of data on user activities provides new opportunities to uncover correlations among user activity, contributed content and the structure of links among users. However, such correlations do not necessarily translate into predictive models. Instead, empirically grounded mechanistic models provide a stronger basis for establishing causal mechanisms and discovering the underlying statistical laws governing social behavior. We describe a statistical physics-based framework for modeling and analyzing social media and illustrate its application to the problems of prediction and inference. We hope these examples will inspire the research community to explore these methods to look for empirically valid causal mechanisms for the observed correlations.
Does Bad News Go Away Faster?
Wu, Shaomei (Cornell University) | Tan, Chenhao (Cornell University) | Kleinberg, Jon (Cornell University) | Macy, Michael Walton (Cornell University)
We study the relationship between content and temporal dynamics of information on Twitter, focusing on the persistence of information. We compare two extreme temporal patterns in the decay rate of URLs embedded in tweets, defining a prediction task to distinguish between URLs that fade rapidly following their peak of popularity and those that fade more slowly. Our experiments show a strong association between the content and the temporal dynamics of information: given unigram features extracted from corresponding HTML webpages, a linear SVM classifier can predict the temporal pattern of URLs with high accuracy. We further explore the content of URLs in the two temporal classes using various textual analysis techniques (via LIWC and trend detection). We find that the rapidly-fading information contains significantly more words related to negative emotion, actions, and more complicated cognitive processes, whereas the persistent information contains more words related to positive emotion, leisure, and lifestyle.
Towards Discovery of Influence and Personality Traits through Social Link Prediction
Nguyen, Thin (Curtin University of Technology) | Phung, Dinh (Curtin University of Technology) | Adams, Brett (Curtin University of Technology) | Venkatesh, Svetha (Curtin University of Technology)
Estimation of a person's influence and personality traits from social media data has many applications. We use social linkage criteria, such as number of followers and friends, as proxies to form corpora, from popular blogging site Livejournal, for examining two two-class classification problems: influential vs. non-influential, and extraversion vs. introversion. Classification is performed using automatically-derived psycholinguistic and mood-based features of a user's textual messages. We experiment with three sub-corpora of 10000 users each, and present the most effective predictors for each category. The best classification result, at 80%, is achieved using psycholinguistic features; e.g., influentials are found to use more complex language, than non-influentials, and use more leisure-related terms.
Supervised Topic Segmentation of Email Conversations
Joty, Shafiq (University of British Columbia) | Carenini, Giuseppe (University of British Columbia) | Murray, Gabriel (University of British Columbia) | Ng, Raymond T (University of British Columbia)
We propose a graph-theoretic supervised topic segmentation model for email conversations which combines (i) lexical knowledge, (ii) conversational features, and (iii) topic features. We compare our results with the existing unsupervised models (i.e., LCSeg and LDA), and with their two extensions for email conversations (i.e., LCSeg+FQG and LDA+FQG) that not only use lexical information but also exploit finer conversation structure. Empirical evaluation shows that our supervised model is the best performer and achieves highest accuracy by combining the three different knowledge sources, where knowledge about the conversation has proved to be the most important indicator for segmenting emails.
Characterizing Social Relations Via NLP-Based Sentiment Analysis
Groh, Georg (TU Muenchen) | Hauffa, Jan (TU Muenchen)
We investigate and evaluate methods for the characterization of social relations from textual communication context, using e-mail as an example. Social relations are intrinsically characterized by the Cartesian product of weights on various axes (we employ valuation and intensity as examples). The prediction of these characteristics is performed by application of unsupervised learning algorithms on meta-data, communication statistics, and the results of deep linguistic analysis of the message body. Classification of sentiment polarity is chosen as the means of linguistic analysis. We find that prediction accuracy can be improved by introducing limited amounts of additional information.
Automatically Identifying Groups Based on Content and Collective Behavioral Patterns of Group Members
Gregory, Michelle (Pacific Northwest National Laboratory) | Engel, Dave W. (Pacific Northwest National Laboratory) | Bell, Eric (Pacific Northwest National Laboratory) | Piatt, Andy (Pacific Northwest National Laboratory) | Dowson, Scott (Pacific Northwest National Laboratory) | Cowell, Andrew (Pacific Northwest National Laboratory)
For example, on Live Journal1, there are a number of categories, gaming, for The explosion of popularity in social media, such as internet example, that one can categorize themselves and their forums, weblogs (blogs), wikis, etc., in the past decade blogs. While a number of those that self select that category has created a new opportunity to measure public opinion, may interact, there is no explicit requirement to do so. If attitude, and social structures (Agichtein et al. 2008, one is interested in marketing to a gaming crowd, for instance, Qualman 2010). A very common social structure investigated knowing all persons interested in gaming would be is online communities, or groups. There are a number useful, even if they do not interact directly with one another.
Large-Scale Community Detection on YouTube for Topic Discovery and Exploration
Gargi, Ullas (Google, Inc.) | Lu, Wenjun (University of Maryland) | Mirrokni, Vahab (Google, Inc.) | Yoon, Sangho (Google, Inc.)
Detecting coherent, well-connected communities in large graphs provides insight into the graph structure and can serve as the basis for content discovery. Clustering is a popular technique for community detection but global algorithms that examine the entire graph do not scale. Local algorithms are highly parallelizable but perform sub-optimally, especially in applications where we need to optimize multiple metrics. We present a multi-stage algorithm based on local-clustering that is highly scalable, combining a pre-processing stage, a lo- cal clustering stage, and a post-processing stage. We apply it to the YouTube video graph to generate named clusters of videos with coherent content. We formalize coverage, co- herence, and connectivity metrics and evaluate the quality of the algorithm for large YouTube graphs. Our use of local algorithms for global clustering, and its implementation and practical evaluation on such a large scale is a first of its kind.
Creating Conversations: An Automated Dialog System
Gandy, Lisa (Northwestern University) | Hammond, Kristian (Northwestern University)
Online news sites often include a comments section where readers are allowed to leave their thoughts. These comments often contain interesting and insightful conversations between readers about the news article. However the richness of these conversations is often lost among other meaningless comments, and moreover all comments are found at the bottom of the web page. In this article, we discuss how our system inserts reader conversations into the news article to create a multimedia presentation called Shout Out. Shout Out features two virtual news anchors: one anchor reads the news and when appropriate the anchor pauses to have a conversation about the news with another anchor. This current iteration of Shout Out combines natural language techniques and reader conversations to create an engaging system.