Community Detection and Classification Guarantees Using Embeddings Learned by Node2Vec
Davison, Andrew, Morgan, S. Carlyle, Ward, Owen G.
Within network science, a widely applicable and important inference task is to understand how the behavior of interactions between different units (nodes) within the network depend on their latent characteristics. This occurs within a wide array of disciplines, from sociological (Freeman, 2004) to biological (Luo et al., 2007) networks. One simple and interpretable model for such a task is the stochastic block model (SBM) (Holland et al., 1983) which assumes that nodes within the network are assigned a discrete community label. Edges between nodes in the network are then formed independently across all pairs of edges, conditional on these community assignments. While such a model is simplistic, it and various extensions, such as the degree corrected SBM (DCSBM), used to handle degree heterogenity (Karrer and Newman, 2011), and mixed-membership SBMs, to allow for more complex community structures (Airoldi, Blei, Fienberg, and Xing, 2008), have seen a wide degree of empirical success (Latouche et al., 2011; Legramanti et al., 2022; Airoldi, Blei, Fienberg, Xing, and Jaakkola, 2006). One restriction of the stochastic block model and its generalizations is the requirement for a discrete community assignment as a latent representation of the units within the network. While the statistical community has previously considered more flexible latent representations (Hoff et al., 2002), over the past decade, there have been significant advancements in general embedding methods for networks, which produce general vector representations of units within a network, and generally achieve start-of-the-art performance in downstream tasks for node classification and link prediction. An early example of such a method is spectral clustering (Ng et al., 2001), which constructs an embedding of the nodes in the network from an eigendecomposition of the graph Laplacian. The k smallest non zero eigenvectors provides a k dimensional representation of each of the nodes in the network.
Oct-26-2023
- Country:
- North America > United States (0.47)
- Genre:
- Overview (0.67)
- Research Report > New Finding (0.46)
- Technology: