Goto

Collaborating Authors

 Europe


The Length of Bridge Ties: Structural and Geographic Properties of Online Social Interactions

AAAI Conferences

The popularity of the Web has allowed individuals to communicate and interact with each other on a global scale: people connect both to close friends and acquaintances, creating ties that can bridge otherwise separated groups of people. Recent evidence suggests that spatial distance is still affecting social links established on online platforms, with online ties preferentially connecting closer people. In this work we study the relationships between interaction strength, spatial distance and structural position of ties between members of a large-scale online social networking platform, Tuenti. We discover that ties in highly connected social groups tend to span shorter distances than connections bridging together otherwise separated portions of the network. We also find that such bridging connections have lower social interaction levels than ties within the inner core of the network and ties connecting to its periphery. Our results suggest that spatial constraints on online social networks are intimately connected to structural network properties, with important consequences for information diffusion.


Modeling Spread of Disease from Social Interactions

AAAI Conferences

Research in computational epidemiology to date has concentrated on coarse-grained statistical analysis of populations, often synthetic ones. By contrast, this paper focuses on fine-grained modeling of the spread of infectious diseases throughout a large real-world social network. Specifically, we study the roles that social ties and interactions between specific individuals play in the progress of a contagion. We focus on public Twitter data, where we find that for every health-related message there are more than 1,000 unrelated ones. This class imbalance makes classification particularly challenging. Nonetheless, we present a framework that accurately identifies sick individuals from the content of online communication. Evaluation on a sample of 2.5 million geo-tagged Twitter messages shows that social ties to infected, symptomatic people, as well as the intensity of recent co-location, sharply increase one's likelihood of contracting the illness in the near future. To our knowledge, this work is the first to model the interplay of social activity, human mobility, and the spread of infectious disease in a large real-world population. Furthermore, we provide the first quantifiable estimates of the characteristics of disease transmission on a large scale without active user participation---a step towards our ability to model and predict the emergence of global epidemics from day-to-day interpersonal interactions.


Coping with the Document Frequency Bias in Sentiment Classification

AAAI Conferences

In this article, we study the polarity detection problem using linear supervised classifiers. We show the interest of penalizing the document frequencies in the regularization process to increase the accuracy. We propose a systematic comparison of different loss and regularization functions on this particular task using the Amazon dataset. Then, we evaluate our models according to three criteria: accuracy, sparsity and subjectivity. The subjectivity is measured by projecting our dictionary and optimized weight vector on the SentiWordNet lexicon. This original approach highlights a bias in the selection of the relevant terms during the regularization procedure: frequent terms are overweighted compared to their intrinsic subjectivities.We show that this bias appears whatever the chosen loss or regularization and on all datasets: it is closely link to the gradient descent technique. Penalizing the document frequency during the learning step enables us to improve significantly our performances. A lot of sentimental markers appear rarely and thus, are unappreciated by statistical learning algorithms. Explicitly boosting their influences leads to increasing the accuracy in the sentiment classification task.


Facebook and Privacy: The Balancing Act of Personality, Gender, and Relationship Currency

AAAI Conferences

Social media profiles are telling examples of the everyday need for disclosure and concealment. The balance between concealment and disclosure varies across individuals, and personality traits might partly explain this variability. Experimental findings on the relationship between information disclosure and personality have been so far inconsistent. We thus study this relationship anew with 1,313 Facebook users in the United States using two personality tests: the big five personality test and the self-monitoring test. We model the process of information disclosure in a principled way using Item Response Theory and correlate the resulting user disclosure scores with personality traits. We find a correlation with the trait of Openness and observe gender effects, in that, men and women share equal amount of private information, but men tend to make it more publicly available, well beyond their social circles. Interestingly, geographic (e.g., residence, hometown) and work-related information is used as relationship currency, in that, it is selectively shared with social contacts and is rarely shared with the Facebook community at large.


Modeling Destructive Group Dynamics in On-Line Gaming Communities

AAAI Conferences

Social groups often exhibit a high degree of dynamism. Some groups thrive, while many others die over time. Modeling destructive dynamics and understanding whether/why/when a person will depart from a group can be important in a number of social domains. In this paper, we take the World of Warcraft game as an exemplar platform for studying destructive group dynamics. We build models to predict if and when an individual is going to quit his/her guild, and whether this quitting event will inflict substantial damage on the guild. Our predictors start from in-game census data and extract features from multiple perspectives such as individual-level, guild-level, game activity, and social interaction features. Our study shows that destructive group dynamics can often be predicted with modest to high accuracy, and feature diversity is critical to prediction performance.


On the Study of Social Interactions in Twitter

AAAI Conferences

Twitter and other social media platforms are increasingly used as the primary way in which people speak with each other. As opposed to other platforms, Twitter is interesting in that many of these dialogues are public and so we can get a view into the dynamics of dialogues and how they differ from other other tweet behaviors. We here analyze tweets gathered from 2400 twitter streams over a one month period. We study social interactions in three important dimensions: what are the salient user behaviors in terms of how often they have social interactions and how these interactions are spread among different people; what are the characteristics of the dialogues, or sets of tweets, that we can extract from these interactions, and what are the characteristics of the social network which emerges from considering these interactions? We find that roughly half of the users spend a fair amount of time interacting whereas 40% of users do not seem to have active interactions. We also find that the vast majority of active dialogues only involve two people despite the public nature of these tweets. We finally find that while the emerging social network does contain a giant component, the component clearly is a set of well-defined tight clusters which are loosely connected.


OMG, I Have to Tweet that! A Study of Factors that Influence Tweet Rates

AAAI Conferences

Many studies have shown that social data such as tweets are a rich source of information about the real-world including, for example, insights into health trends. A key limitation when analyzing Twitter data, however, is that it depends on people self-reporting their own behaviors and observations. In this paper, we present a large-scale quantitative analysis of some of the factors that influence self-reporting bias. In our study, we compare a year of tweets about weather events to ground-truth knowledge about actual weather occurrences. For each weather event we calculate how extreme, how expected, and how big a change the event represents. We calculate the extent to which these factors can explain the daily variations in tweet rates about weather events. We find that we can build global models that take into account basic weather information, together with extremeness, expectation and change calculations to account for over 40% of the variability in tweet rates. We build location-specific (i.e., a model per each metropolitan area) models that account for an average of 70% of the variability in tweet rates.


Exploring Social-Historical Ties on Location-Based Social Networks

AAAI Conferences

Location-based social networks (LBSNs) have become a popular form of social media in recent years. They provide location related services that allow users to "check-in'' at geographical locations and share such experiences with their friends. Millions of "check-in'' records in LBSNs contain rich information of social and geographical context and provide a unique opportunity for researchers to study user's social behavior from a spatial-temporal aspect, which in turn enables a variety of services including place advertisement, traffic forecasting, and disaster relief. In this paper, we propose a social-historical model to explore user's check-in behavior on LBSNs. Our model integrates the social and historical effects and assesses the role of social correlation in user's check-in behavior. In particular, our model captures the property of user's check-in history in forms of power-law distribution and short-term effect, and helps in explaining user's check-in behavior. The experimental results on a real world LBSN demonstrate that our approach properly models user's check-ins and shows how social and historical ties can help location prediction.


Distributional Footprints of Deceptive Product Reviews

AAAI Conferences

This paper postulates that there are natural distributions of opinions in product reviews. In particular, we hypothesize that for a given domain, there is a set of representative distributions of review rating scores. A deceptive business entity that hires people to write fake reviews will necessarily distort its distribution of review scores, leaving distributional footprints behind. In order to validate this hypothesis, we introduce strategies to create dataset with pseudo-gold standard that is labeled automatically based on different types of distributional footprints. A range of experiments confirm the hypothesized connection between the distributional anomaly and deceptive reviews. This study also provides novel quantitative insights into the characteristics of natural distributions of opinions in the TripAdvisor hotel review and the Amazon product review domains.


The Livehoods Project: Utilizing Social Media to Understand the Dynamics of a City

AAAI Conferences

Studying the social dynamics of a city on a large scale has tra- ditionally been a challenging endeavor, requiring long hours of observation and interviews, usually resulting in only a par- tial depiction of reality. At the same time, the boundaries of municipal organizational units, such as neighborhoods and districts, are largely statically defined by the city government and do not always reflect the character of life in these ar- eas. To address both difficulties, we introduce a clustering model and research methodology for studying the structure and composition of a city based on the social media its res- idents generate. We use data from approximately 18 million check-ins collected from users of a location-based online so- cial network. The resulting clusters, which we call Livehoods, are representations of the dynamic urban areas that comprise the city. We take an interdisciplinary approach to validating these clusters, interviewing 27 residents of Pittsburgh, PA, to see how their perceptions of the city project onto our findings there. Our results provide strong support for the discovered clusters, showing how Livehoods reveal the distinctly charac- terized areas of the city and the forces that shape them.