Graphon based Clustering and Testing of Networks: Algorithms and Theory

Sabanayagam, Mahalakshmi, Vankadara, Leena Chennuru, Ghoshdastidar, Debarghya

arXiv.org Machine Learning 

Network-valued data are encountered in a wide range of applications, and pose challenges in learning due to their complex structure and absence of vertex correspondence. Typical examples of such problems include classification or grouping of protein structures and social networks. Various methods, ranging from graph kernels to graph neural networks, have been proposed that achieve some success in graph classification problems. However, most methods have limited theoretical justification, and their applicability beyond classification remains unexplored. In this work, we propose methods for clustering multiple graphs, without vertex correspondence, that are inspired by the recent literature on estimating graphons-- symmetric functions corresponding to infinite vertex limit of graphs. We propose a novel graph distance based on sorting-and-smoothing graphon estimators. Using the proposed graph distance, we present two clustering algorithms and show that they achieve state-of-the-art results. We prove the statistical consistency of both algorithms under Lipschitz assumptions on the graph degrees. We further study the applicability of the proposed distance for graph two-sample testing problems. Machine learning on graphs has evolved considerably over the past two decades. The traditional view towards network analysis is limited to modelling interactions among entities of interest, for instance social networks or world wide web, and learning algorithms based on graph theory have been commonly used to solve these problems (Von Luxburg, 2007; Yan et al., 2006).