Goto

Collaborating Authors

 giant component


On Balancing Sparsity with Reliable Connectivity in Distributed Network Design with Random K-out Graphs

arXiv.org Artificial Intelligence

In several applications in distributed systems, an important design criterion is ensuring that the network is sparse, i.e., does not contain too many edges, while achieving reliable connectivity. Sparsity ensures communication overhead remains low, while reliable connectivity is tied to reliable communication and inference on decentralized data reservoirs and computational resources. A class of network models called random K-out graphs appear widely as a heuristic to balance connectivity and sparsity, especially in settings with limited trust, e.g., privacy-preserving aggregation of networked data in which networks are deployed. However, several questions remain regarding how to choose network parameters in response to different operational requirements, including the need to go beyond asymptotic results and the ability to model the stochastic and adversarial environments. To address this gap, we present theorems to inform the choice of network parameters that guarantee reliable connectivity in regimes where nodes can be finite or unreliable. We first derive upper and lower bounds for probability of connectivity in random K-out graphs when the number of nodes is finite. Next, we analyze the property of r-robustness, a stronger notion than connectivity that enables resilient consensus in the presence of malicious nodes. Finally, motivated by aggregation mechanisms based on pairwise masking, we model and analyze the impact of a subset of adversarial nodes, modeled as deletions, on connectivity and giant component size - metrics that are closely tied to privacy guarantees. Together, our results pave the way for end-to-end performance guarantees for a suite of algorithms for reliable inference on networks.


Geodesic statistics for random network families

arXiv.org Machine Learning

A key task in the study of networked systems is to derive local and global properties that impact connectivity, synchronizability, and robustness. Computing shortest paths or geodesics in the network yields measures of node centrality and network connectivity that can contribute to explain such phenomena. We derive an analytic distribution of shortest path lengths, on the giant component in the supercritical regime or on small components in the subcritical regime, of any sparse (possibly directed) graph with conditionally independent edges, in the infinite-size limit. We provide specific results for widely used network families like stochastic block models, dot-product graphs, random geometric graphs, and graphons. The survival function of the shortest path length distribution possesses a simple closed-form lower bound which is asymptotically tight for finite lengths, has a natural interpretation of traversing independent geodesics in the network, and delivers novel insight in the above network families. Notably, the shortest path length distribution allows us to derive, for the network families above, important graph properties like the bond percolation threshold, size of the giant component, average shortest path length, and closeness and betweenness centralities. We also provide a corroborative analysis of a set of 20 empirical networks. This unifying framework demonstrates how geodesic statistics for a rich family of random graphs can be computed cheaply without having access to true or simulated networks, especially when they are sparse but prohibitively large.


Inferring urban social networks from publicly available data

arXiv.org Artificial Intelligence

Defining accurate models for real-world social networks is instrumental in several research fields, e.g., in sociology [1], epidemiology [2] or marketing [3]. In combination with computer simulations these models may represent a valuable tool to understand social phenomena, along with classic analytical studies. Dynamic processes, such as the spread of a disease or a rumour, can be represented upon suitable networks that encode the patterns of connection and interaction among the individuals of a population. Moreover, the comparison of synthetic networks produced by different generative models helps to infer how each factor contributes to the emergence of experimentally measured properties of real networks [4]. In this paper, we present a novel computational model for urban social networks, that combines a data-driven framework with a set of adjustable parameters. A fully operational open source implementation of the model is available under the GPL v3 at gitlab.com/cranic-group/usn. The software allows to generate a synthetic social network of "strong ties" [5] among geo-referenced and age-stratified individuals. The graph encodes information on the urban social fabric and, as such, it increases the plausibility of dynamic (e.g., transmission) processes that may be influenced by preferences and actions of agents and groups of related agents. On the one hand, our social graph may be used to simulate the fact that friends and relatives may go out together, organize public or private meetings, and are, in general, more likely to interact.


Impossibility of Partial Recovery in the Graph Alignment Problem

arXiv.org Machine Learning

Random graph alignment refers to recovering the underlying vertex correspondence between two random graphs with correlated edges. This can be viewed as an average-case and noisy version of the well-known NP-hard graph isomorphism problem. For the correlated Erd\"os-R\'enyi model, we prove an impossibility result for partial recovery in the sparse regime, with constant average degree and correlation, as well as a general bound on the maximal reachable overlap. Our bound is tight in the noiseless case (the graph isomorphism problem) and we conjecture that it is still tight with noise. Our proof technique relies on a careful application of the probabilistic method to build automorphisms between tree components of a subcritical Erd\"os-R\'enyi graph.


A unified framework for spectral clustering in sparse graphs

arXiv.org Machine Learning

One of the most natural tasks in graph theory is community detection, i.e., the identification of similarity groups on a given network. Practically, for an unweighted and undirected graph G(V, E) with V n nodes and E edges, community detection consists in finding a non-overlapping partition of the nodes that identifies underlying communities in a completely unsupervised manner. There is no unique definition of a community, but a general criterion is to impose that nodes in the same community have more interconnections than nodes in different communities, as a consequence of the stronger affinity among members of the same community [17]. There exist many ways of formalizing this intuition, some of them under the form of a cost function to minimize, such as the MinCut, RatioCut, and NormalizedCut costs [53]. The resulting optimizations are however NPhard problems and, as a consequence, many algorithms consist in retrieving relaxed continuous solutions of the problem.


Complex Network Analysis of Men Single ATP Tennis Matches

arXiv.org Artificial Intelligence

During the last decades Network Science field has been rediscovered and addressed as the "new science" [2], [3]. A lot of issues have been (re-)examined thanks to Network Science techniques, which are nowadays permeating the way we face the world as a unique interconnected component. The presence and the immediate availability of a huge amount of digital data describing every kind of network and the way in which its nodes interact, has made possible an interdisciplinary analysis of many large-scale systems. Similar techniques have been recently applied also to professional sports, in order to discover complex interactions phenomena and universal rules which are almost invisible and difficult to recognize restricting the attention to small networks or to microscopic level. For example, complexnetwork analysis were conducted on soccer (e.g. in [4] and [5]), football ([6] and [7]), basket ([8] and [9]), baseball ([10]) and cricket ([11] and [12]), just to name a few. In professional tennis as well, there are few studies examining how to map matches into complex networks and then developing new ranking methods alternative to the ATP (Association of Tennis Professionals) official one. The first work of this kind is represented by [13], where the authors explained the network generation and then they performed some simple analysis on single Grand Slams tournaments matches only (i.e.


Rapid Mixing Swendsen-Wang Sampler for Stochastic Partitioned Attractive Models

arXiv.org Machine Learning

The Gibbs sampler is a particularly popular Markov chain used for learning and inference problems in Graphical Models (GMs). These tasks are computationally intractable in general, and the Gibbs sampler often suffers from slow mixing. In this paper, we study the Swendsen-Wang dynamics which is a more sophisticated Markov chain designed to overcome bottlenecks that impede the Gibbs sampler. We prove O(\log n) mixing time for attractive binary pairwise GMs (i.e., ferromagnetic Ising models) on stochastic partitioned graphs having n vertices, under some mild conditions, including low temperature regions where the Gibbs sampler provably mixes exponentially slow. Our experiments also confirm that the Swendsen-Wang sampler significantly outperforms the Gibbs sampler when they are used for learning parameters of attractive GMs.


Community Detection in Networks using Graph Distance

arXiv.org Machine Learning

The study of networks has received increased attention recently not only from the social sciences and statistics but also from physicists, computer scientists and mathematicians. One of the principal problem in networks is community detection. Many algorithms have been proposed for community finding but most of them do not have have theoretical guarantee for sparse networks and networks close to the phase transition boundary proposed by physicists. There are some exceptions but all have some incomplete theoretical basis. Here we propose an algorithm based on the graph distance of vertices in the network. We give theoretical guarantees that our method works in identifying communities for block models and can be extended for degree-corrected block models and block models with the number of communities growing with number of vertices. Despite favorable simulation results, we are not yet able to conclude that our method is satisfactory for worst possible case. We illustrate on a network of political blogs, Facebook networks and some other networks.


Benefits of Semantics on Web Service Composition from a Complex Network Perspective

arXiv.org Artificial Intelligence

The number of publicly available Web services (WS) is continuously growing, and in parallel, we are witnessing a rapid development in semantic-related web technologies. The intersection of the semantic web and WS allows the development of semantic WS. In this work, we adopt a complex network perspective to perform a comparative analysis of the syntactic and semantic approaches used to describe WS. From a collection of publicly available WS descriptions, we extract syntactic and semantic WS interaction networks. We take advantage of tools from the complex network field to analyze them and determine their properties. We show that WS interaction networks exhibit some of the typical characteristics observed in real-world networks, such as short average distance between nodes and community structure. By comparing syntactic and semantic networks through their properties, we show the introduction of semantics in WS descriptions should improve the composition process.


On the Study of Social Interactions in Twitter

AAAI Conferences

Twitter and other social media platforms are increasingly used as the primary way in which people speak with each other. As opposed to other platforms, Twitter is interesting in that many of these dialogues are public and so we can get a view into the dynamics of dialogues and how they differ from other other tweet behaviors. We here analyze tweets gathered from 2400 twitter streams over a one month period. We study social interactions in three important dimensions: what are the salient user behaviors in terms of how often they have social interactions and how these interactions are spread among different people; what are the characteristics of the dialogues, or sets of tweets, that we can extract from these interactions, and what are the characteristics of the social network which emerges from considering these interactions? We find that roughly half of the users spend a fair amount of time interacting whereas 40% of users do not seem to have active interactions. We also find that the vast majority of active dialogues only involve two people despite the public nature of these tweets. We finally find that while the emerging social network does contain a giant component, the component clearly is a set of well-defined tight clusters which are loosely connected.