Goto

Collaborating Authors

 Ward, Owen G.


Community Detection and Classification Guarantees Using Embeddings Learned by Node2Vec

arXiv.org Machine Learning

Within network science, a widely applicable and important inference task is to understand how the behavior of interactions between different units (nodes) within the network depend on their latent characteristics. This occurs within a wide array of disciplines, from sociological (Freeman, 2004) to biological (Luo et al., 2007) networks. One simple and interpretable model for such a task is the stochastic block model (SBM) (Holland et al., 1983) which assumes that nodes within the network are assigned a discrete community label. Edges between nodes in the network are then formed independently across all pairs of edges, conditional on these community assignments. While such a model is simplistic, it and various extensions, such as the degree corrected SBM (DCSBM), used to handle degree heterogenity (Karrer and Newman, 2011), and mixed-membership SBMs, to allow for more complex community structures (Airoldi, Blei, Fienberg, and Xing, 2008), have seen a wide degree of empirical success (Latouche et al., 2011; Legramanti et al., 2022; Airoldi, Blei, Fienberg, Xing, and Jaakkola, 2006). One restriction of the stochastic block model and its generalizations is the requirement for a discrete community assignment as a latent representation of the units within the network. While the statistical community has previously considered more flexible latent representations (Hoff et al., 2002), over the past decade, there have been significant advancements in general embedding methods for networks, which produce general vector representations of units within a network, and generally achieve start-of-the-art performance in downstream tasks for node classification and link prediction. An early example of such a method is spectral clustering (Ng et al., 2001), which constructs an embedding of the nodes in the network from an eigendecomposition of the graph Laplacian. The k smallest non zero eigenvectors provides a k dimensional representation of each of the nodes in the network.


Online Community Detection for Event Streams on Networks

arXiv.org Machine Learning

A common goal in network modeling is to uncover the latent community structure present among nodes. For many real-world networks, observed connections consist of events arriving as streams, which are then aggregated to form edges, ignoring the temporal dynamic component. A natural way to take account of this temporal dynamic component of interactions is to use point processes as the foundation of the network models for community detection. Computational complexity hampers the scalability of such approaches to large sparse networks. To circumvent this challenge, we propose a fast online variational inference algorithm for learning the community structure underlying dynamic event arrivals on a network using continuous-time point process latent network models. We provide regret bounds on the loss function of this procedure, giving theoretical guarantees on performance. The proposed algorithm is illustrated, using both simulation studies and real data, to have comparable performance in terms of community structure in terms of community recovery to non-online variants. Our proposed framework can also be readily modified to incorporate other popular network structures.


Next Waves in Veridical Network Embedding

arXiv.org Machine Learning

Embedding nodes of a large network into a metric (e.g., Euclidean) space has become an area of active research in statistical machine learning, which has found applications in natural and social sciences. Generally, a representation of a network object is learned in a Euclidean geometry and is then used for subsequent tasks regarding the nodes and/or edges of the network, such as community detection, node classification and link prediction. Network embedding algorithms have been proposed in multiple disciplines, often with domain-specific notations and details. In addition, different measures and tools have been adopted to evaluate and compare the methods proposed under different settings, often dependent of the downstream tasks. As a result, it is challenging to study these algorithms in the literature systematically. Motivated by the recently proposed Veridical Data Science (VDS) framework, we propose a framework for network embedding algorithms and discuss how the principles of predictability, computability and stability apply in this context. The utilization of this framework in network embedding holds the potential to motivate and point to new directions for future research.