Theoretical progress in understanding the dynamics of spreading processes on graphs suggests the existence of an epidemic threshold below which no epidemics form and above which epidemics spread to a significant fraction of the graph. We have observed information cascades on the social media site Digg that spread fast enough for one initial spreader to infect hundreds of people, yet end up affecting only 0.1% of the entire network. We find that two effects, previously studied in isolation, combine cooperatively to drastically limit the final size of cascades on Digg. First, because of the highly clustered structure of the Digg network, most people who are aware of a story have been exposed to it via multiple friends. This structure lowers the epidemic threshold while moderately slowing the overall growth of cascades. In addition, we find that the mechanism for social contagion on Digg points to a fundamental difference between information spread and other contagion processes: despite multiple opportunities for infection within a social group, people are less likely to become spreaders of information with repeated exposure. The consequences of this mechanism become more pronounced for more clustered graphs. Ultimately, this effect severely curtails the size of social epidemics on Digg.
Fink, Clay (The Johns Hopkins University) | Schmidt, Aurora C. (The Johns Hopkins University) | Barash, Vladimir (Graphika, Inc.) | Kelly, John (Graphika, Inc.) | Cameron, Christopher (Cornell University) | Macy, Michael (Cornell University)
Social contagion is the mechanism by which ideas and behaviors spread across human social networks. Simple contagion models approximate the likelihood of adoption as constant with each exposure to an "infected" network neighbor. However, social theory postulates that when adopting an idea or behavior carries personal or social risk, an individual's adoption likelihood also depends on the number of distinct neighbors who have adopted. Such complex contagions are thought to govern the spread of social movements and other important social phenomena. Online sites, such as Twitter, expose social interactions at a large scale and provide an opportunity to observe the spread of social contagions "in the wild." Much of the effort in searching for complex phenomena in real world contagions focuses on measuring user adoption thresholds. In this work, we show an alternative method for fitting probabilistic complex contagion models to empirical data that avoids measuring thresholds directly, and our results indicate bias in observed thresholds under both complex and simple models. We also show 1) that probabilistic models of simple and complex contagion are distinguishable when applied to an empirical social network with random user activity; and 2) the predictive power of these probabilistic adoption models against observed adoptions of actual hashtags used on Twitter. We use a set of tweets collected from Nigeria in 2014, focusing on 20 popular hashtags, using the follow graphs of the users adopting the tags during their initial peaks of activity.
What will social media sites of tomorrow look like? What behaviors will their interfaces enable? A major challenge for designing new sites that allow a broader range of user actions is the difficulty of extrapolating from experience with current sites without first distinguishing correlations from underlying causal mechanisms. The growing availability of data on user activities provides new opportunities to uncover correlations among user activity, contributed content and the structure of links among users. However, such correlations do not necessarily translate into predictive models. Instead, empirically grounded mechanistic models provide a stronger basis for establishing causal mechanisms and discovering the underlying statistical laws governing social behavior. We describe a statistical physics-based framework for modeling and analyzing social media and illustrate its application to the problems of prediction and inference. We hope these examples will inspire the research community to explore these methods to look for empirically valid causal mechanisms for the observed correlations.
In many real-world scenarios, it is nearly impossible to collect explicit social network data. In such cases, whole networks must be inferred from underlying observations. Here, we formulate the problem of inferring latent social networks based on network diffusion or disease propagation data. We consider contagions propagating over the edges of an unobserved social network, where we only observe the times when nodes became infected, but not who infected them. Given such node infection times, we then identify the optimal network that best explains the observed data. We present a maximum likelihood approach based on convex programming with a l1-like penalty term that encourages sparsity. Experiments on real and synthetic data reveal that our method near-perfectly recovers the underlying network structure as well as the parameters of the contagion propagation model. Moreover, our approach scales well as it can infer optimal networks on thousands of nodes in a matter of minutes.
It is the main purpose of this paper to introduce a graph-valued stochastic process in order to model the spread of a communicable infectious disease. The major novelty of the SIR model we promote lies in the fact that the social network on which the epidemics is taking place is not specified in advance but evolves through time, accounting for the temporal evolution of the interactions involving infective individuals. Without assuming the existence of a fixed underlying network model, the stochastic process introduced describes, in a flexible and realistic manner, epidemic spread in non-uniformly mixing and possibly heterogeneous populations. It is shown how to fit such a (parametrised) model by means of Approximate Bayesian Computation methods based on graph-valued statistics. The concepts and statistical methods described in this paper are finally applied to a real epidemic dataset, related to the spread of HIV in Cuba in presence of a contact tracing system, which permits one to reconstruct partly the evolution of the graph of sexual partners diagnosed HIV positive between 1986 and 2006.