Elliott, Andrew
L2G2G: a Scalable Local-to-Global Network Embedding with Graph Autoencoders
Ouyang, Ruikang, Elliott, Andrew, Limnios, Stratis, Cucuringu, Mihai, Reinert, Gesine
For analysing real-world networks, graph representation learning is a popular tool. These methods, such as a graph autoencoder (GAE), typically rely on low-dimensional representations, also called embeddings, which are obtained through minimising a loss function; these embeddings are used with a decoder for downstream tasks such as node classification and edge prediction. While GAEs tend to be fairly accurate, they suffer from scalability issues. For improved speed, a Local2Global approach, which combines graph patch embeddings based on eigenvector synchronisation, was shown to be fast and achieve good accuracy. Here we propose L2G2G, a Local2Global method which improves GAE accuracy without sacrificing scalability. This improvement is achieved by dynamically synchronising the latent node representations, while training the GAEs. It also benefits from the decoder computing an only local patch loss. Hence, aligning the local embeddings in each epoch utilises more information from the graph than a single post-training alignment does, while maintaining scalability. We illustrate on synthetic benchmarks, as well as real-world examples, that L2G2G achieves higher accuracy than the standard Local2Global approach and scales efficiently on the larger data sets. We find that for large and dense networks, it even outperforms the slow, but assumed more accurate, GAEs.
DAMNETS: A Deep Autoregressive Model for Generating Markovian Network Time Series
Clarkson, Jase, Cucuringu, Mihai, Elliott, Andrew, Reinert, Gesine
Generative models for network time series (also known as dynamic graphs) have tremendous potential in fields such as epidemiology, biology and economics, where complex graph-based dynamics are core objects of study. Designing flexible and scalable generative models is a very challenging task due to the high dimensionality of the data, as well as the need to represent temporal dependencies and marginal network structure. Here we introduce DAMNETS, a scalable deep generative model for network time series. DAMNETS outperforms competing methods on all of our measures of sample quality, over both real and synthetic data sets.
SaGess: Sampling Graph Denoising Diffusion Model for Scalable Graph Generation
Limnios, Stratis, Selvaraj, Praveen, Cucuringu, Mihai, Maple, Carsten, Reinert, Gesine, Elliott, Andrew
Over recent years, denoising diffusion generative models have come to be considered as state-of-the-art methods for synthetic data generation, especially in the case of generating images. These approaches have also proved successful in other applications such as tabular and graph data generation. However, due to computational complexity, to this date, the application of these techniques to graph data has been restricted to small graphs, such as those used in molecular modeling. In this paper, we propose SaGess, a discrete denoising diffusion approach, which is able to generate large real-world networks by augmenting a diffusion model (DiGress) with a generalized divide-and-conquer framework. The algorithm is capable of generating larger graphs by sampling a covering of subgraphs of the initial graph in order to train DiGress. SaGess then constructs a synthetic graph using the subgraphs that have been generated by DiGress. We evaluate the quality of the synthetic data sets against several competitor methods by comparing graph statistics between the original and synthetic samples, as well as evaluating the utility of the synthetic data set produced by using it to train a task-driven model, namely link prediction. In our experiments, SaGess, outperforms most of the one-shot state-of-the-art graph generating methods by a significant factor, both on the graph metrics and on the link prediction task.
Agent swarms: cooperation and coordination under stringent communications constraint
Kinsler, Paul, Holman, Sean, Elliott, Andrew, Mitchell, Cathryn N., Wilson, R. Eddie
Here we consider the communications tactics appropriate for a group of agents that need to "swarm" together in a highly adversarial environment. Specfically, whilst they need to cooperate by exchanging information with each other about their location and their plans; at the same time they also need to keep such communications to an absolute minimum. This might be due to a need for stealth, or otherwise be relevant to situations where communications are signficantly restricted. Complicating this process is that we assume each agent has (a) no means of passively locating others, (b) it must rely on being updated by reception of appropriate messages; and if no such update messages arrive, (c) then their own beliefs about other agents will gradually become out of date and increasingly inaccurate. Here we use a geometry-free multi-agent model that is capable of allowing for message-based information transfer between agents with different intrinsic connectivities, as would be present in a spatial arrangement of agents. We present agent-centric performance metrics that require only minimal assumptions, and show how simulated outcome distributions, risks, and connectivities depend on the ratio of information gain to loss. We also show that checking for too-long round-trip times can be an effective minimal-information filter for determining which agents to no longer target with messages.
TAPAS: a Toolbox for Adversarial Privacy Auditing of Synthetic Data
Houssiau, Florimond, Jordon, James, Cohen, Samuel N., Daniel, Owen, Elliott, Andrew, Geddes, James, Mole, Callum, Rangel-Smith, Camila, Szpruch, Lukasz
Personal data collected at scale promises to improve decision-making and accelerate innovation. However, sharing and using such data raises serious privacy concerns. A promising solution is to produce synthetic data, artificial records to share instead of real data. Since synthetic records are not linked to real persons, this intuitively prevents classical re-identification attacks. However, this is insufficient to protect privacy. We here present TAPAS, a toolbox of attacks to evaluate synthetic data privacy under a wide range of scenarios. These attacks include generalizations of prior works and novel attacks. We also introduce a general framework for reasoning about privacy threats to synthetic data and showcase TAPAS on several examples.