Goto

Collaborating Authors

 Industry


Automatic Versus Human Navigation in Information Networks

AAAI Conferences

People regularly face tasks that can be understood as navigation in information networks, where the goal is to find a path between two given nodes. In many such situations, the navigator only gets local access to the node currently under inspection and its immediate neighbors. This lack of global information about the network notwithstanding, humans tend to be good at finding short paths, despite the fact that real-world networks are typically very large. One potential reason for this could be that humans possess vast amounts of background knowledge about the world, which they leverage to make good guesses about possible solutions. In this paper we ask the question: Are human-like high-level reasoning skills really necessary for finding short paths? To answer this question, we design a number of navigation agents without such skills, which use only simple numerical features. We evaluate the agents on the task of navigating Wikipedia, a domain for which we also possess large-scale human navigation data. We observe that the agents find shorter paths than humans on average and therefore conclude that, perhaps surprisingly, no sophisticated background knowledge or high-level reasoning is required for navigating the complex Wikipedia network.


The YouTube Social Network

AAAI Conferences

Today, YouTube is the largest user-driven video content provider in the world; it has become a major platform for disseminating multimedia information. A major contribution to its success comes from the user-to-user social experience that differentiates it from traditional content broadcasters. This work examines the social network aspect of YouTube by measuring the full-scale YouTube subscription graph, comment graph, and video content corpus. We find YouTube to deviate significantly from network characteristics that mark traditional online social networks, such as homophily, reciprocative linking, and assortativity. However, comparing to reported characteristics of another content-driven online social network, Twitter, YouTube is remarkably similar. Examining the social and content facets of user popularity, we find a stronger correlation between a user's social popularity and his/her most popular content as opposed to typical content popularity. Finally, we demonstrate an application of our measurements for classifying YouTube Partners, who are selected users that share YouTube's advertisement revenue. Results are motivating despite the highly imbalanced nature of the classification problem.


The Length of Bridge Ties: Structural and Geographic Properties of Online Social Interactions

AAAI Conferences

The popularity of the Web has allowed individuals to communicate and interact with each other on a global scale: people connect both to close friends and acquaintances, creating ties that can bridge otherwise separated groups of people. Recent evidence suggests that spatial distance is still affecting social links established on online platforms, with online ties preferentially connecting closer people. In this work we study the relationships between interaction strength, spatial distance and structural position of ties between members of a large-scale online social networking platform, Tuenti. We discover that ties in highly connected social groups tend to span shorter distances than connections bridging together otherwise separated portions of the network. We also find that such bridging connections have lower social interaction levels than ties within the inner core of the network and ties connecting to its periphery. Our results suggest that spatial constraints on online social networks are intimately connected to structural network properties, with important consequences for information diffusion.


Modeling Spread of Disease from Social Interactions

AAAI Conferences

Research in computational epidemiology to date has concentrated on coarse-grained statistical analysis of populations, often synthetic ones. By contrast, this paper focuses on fine-grained modeling of the spread of infectious diseases throughout a large real-world social network. Specifically, we study the roles that social ties and interactions between specific individuals play in the progress of a contagion. We focus on public Twitter data, where we find that for every health-related message there are more than 1,000 unrelated ones. This class imbalance makes classification particularly challenging. Nonetheless, we present a framework that accurately identifies sick individuals from the content of online communication. Evaluation on a sample of 2.5 million geo-tagged Twitter messages shows that social ties to infected, symptomatic people, as well as the intensity of recent co-location, sharply increase one's likelihood of contracting the illness in the near future. To our knowledge, this work is the first to model the interplay of social activity, human mobility, and the spread of infectious disease in a large real-world population. Furthermore, we provide the first quantifiable estimates of the characteristics of disease transmission on a large scale without active user participation---a step towards our ability to model and predict the emergence of global epidemics from day-to-day interpersonal interactions.


Facebook and Privacy: The Balancing Act of Personality, Gender, and Relationship Currency

AAAI Conferences

Social media profiles are telling examples of the everyday need for disclosure and concealment. The balance between concealment and disclosure varies across individuals, and personality traits might partly explain this variability. Experimental findings on the relationship between information disclosure and personality have been so far inconsistent. We thus study this relationship anew with 1,313 Facebook users in the United States using two personality tests: the big five personality test and the self-monitoring test. We model the process of information disclosure in a principled way using Item Response Theory and correlate the resulting user disclosure scores with personality traits. We find a correlation with the trait of Openness and observe gender effects, in that, men and women share equal amount of private information, but men tend to make it more publicly available, well beyond their social circles. Interestingly, geographic (e.g., residence, hometown) and work-related information is used as relationship currency, in that, it is selectively shared with social contacts and is rarely shared with the Facebook community at large.


Modeling Destructive Group Dynamics in On-Line Gaming Communities

AAAI Conferences

Social groups often exhibit a high degree of dynamism. Some groups thrive, while many others die over time. Modeling destructive dynamics and understanding whether/why/when a person will depart from a group can be important in a number of social domains. In this paper, we take the World of Warcraft game as an exemplar platform for studying destructive group dynamics. We build models to predict if and when an individual is going to quit his/her guild, and whether this quitting event will inflict substantial damage on the guild. Our predictors start from in-game census data and extract features from multiple perspectives such as individual-level, guild-level, game activity, and social interaction features. Our study shows that destructive group dynamics can often be predicted with modest to high accuracy, and feature diversity is critical to prediction performance.


Have You Heard?: How Gossip Flows Through Workplace Email

AAAI Conferences

We spend a significant part of our lives chatting about other people. In other words, we all gossip. Although sometimes a contentious topic, various researchers have shown gossip to be fundamental to social life—from small groups to large, formal organizations. In this paper, we present the first study of gossip in a large CMC corpus. Adopting the Enron email dataset and natural language techniques, we arrive at four main findings. First, workplace gossip is common at all levels of the organizational hierarchy, with people most likely to gossip with their peers. Moreover, employees at the lowest level play a major role in circulating it. Second, gossip appears as often in personal exchanges as it does in formal business communication. Third, by deriving a power-law relation, we show that it is more likely for an email to contain gossip if targeted to a smaller audience. Finally, we explore the sentiment associated with gossip email, finding that gossip is in fact quite often negative: 2.7 times more frequent than positive gossip.


Crossing Media Streams with Sentiment: Domain Adaptation in Blogs, Reviews and Twitter

AAAI Conferences

Most sentiment analysis studies address classification of a single source of data such as reviews or blog posts. However, the multitude of social media sources available for text analysis lends itself naturally to domain adaptation. In this study, we create a dataset spanning three social media sources -- blogs, reviews, and Twitter -- and a set of 37 common topics. We first examine sentiments expressed in these three sources while controlling for the change in topic. Then using this multi-dimensional data we show that when classifying documents in one source (a target source), models trained on other sources of data can be as good as or even better than those trained on the target data. That is, we show that models trained on some social media sources are generalizable to others. All source adaptation models we implement show reviews and Twitter to be the best sources of training data. It is especially useful to know that models trained on Twitter data are generalizable, since, unlike reviews, Twitter is more topically diverse.


On the Study of Social Interactions in Twitter

AAAI Conferences

Twitter and other social media platforms are increasingly used as the primary way in which people speak with each other. As opposed to other platforms, Twitter is interesting in that many of these dialogues are public and so we can get a view into the dynamics of dialogues and how they differ from other other tweet behaviors. We here analyze tweets gathered from 2400 twitter streams over a one month period. We study social interactions in three important dimensions: what are the salient user behaviors in terms of how often they have social interactions and how these interactions are spread among different people; what are the characteristics of the dialogues, or sets of tweets, that we can extract from these interactions, and what are the characteristics of the social network which emerges from considering these interactions? We find that roughly half of the users spend a fair amount of time interacting whereas 40% of users do not seem to have active interactions. We also find that the vast majority of active dialogues only involve two people despite the public nature of these tweets. We finally find that while the emerging social network does contain a giant component, the component clearly is a set of well-defined tight clusters which are loosely connected.


Modeling Diffusion in Social Networks Using Network Properties

AAAI Conferences

Diffusion of items occurs in social networks due to spreading of items through word of mouth and exogenous factors. These items may be news, products, videos, advertisements or contagious viruses. Previous research has studied diffusion process at both the macro and micro levels. The former models the number of item adopters in the diffusion process while the latter determines which individuals adopt item. In this paper, we establish a general probabilistic framework, which can be used to derive macro-level diffusion models, including the well known Bass Model (BM). Using this framework, we develop several other models considering the social network’s degree distribution coupled with the assumption of linear influence by neighboring adopters in the diffusion process. Through some evaluation on synthetic data, this paper shows that degree distribution actually changes during the diffusion process. We therefore introduce a multi-stage diffusion model to cope with variable degree distribution. By conducting experiments on both synthetic and real datasets, we show that our proposed diffusion models can recover the diffusion parameters from the observed diffusion data, which allows us to model diffusion with high accuracy.