Goto

Collaborating Authors

 block model








SupplementaryMaterial MatrixCompletionwithHierarchical GraphSideInformation

Neural Information Processing Systems

This implies that M(δ) = T(δ), i.e., the constraint(13) made in T(δ) does not lose any generality in matrix representation. One technical distinction relative to the previous works [2,3] arises from the fact that in our setting, the hamming distances(dx1(`),dx2(`),dx3(`)) defined w.r.t. We focus on the family of rating matrices{Mhci: c T`}. First, we present the following lemma that guarantees the existence of two subsets of users with certainproperties. The proof of this case follows the same structure as that of the grouping-limited regime. It is shown that the groups within each cluster are recovered with a vanishing fraction of errors if Ig = ω(1/n).


MatrixCompletionwithHierarchical GraphSideInformation

Neural Information Processing Systems

First wecharacterize theinformation-theoretic sharp threshold on the minimum number of observed matrix entries required for reliable matrix completion, as a function of the quantified quality (to be detailed) of the considered hierarchical graph side information.


Matrix Factorization Framework for Community Detection under the Degree-Corrected Block Model

Dache, Alexandra, Vandaele, Arnaud, Gillis, Nicolas

arXiv.org Machine Learning

Community detection is a fundamental task in data analysis. Block models form a standard approach to partition nodes according to a graph model, facilitating the analysis and interpretation of the network structure. By grouping nodes with similar connection patterns, they enable the identification of a wide variety of underlying structures. The degree-corrected block model (DCBM) is an established model that accounts for the heterogeneity of node degrees. However, existing inference methods for the DCBM are heuristics that are highly sensitive to initialization, typically done randomly. In this work, we show that DCBM inference can be reformulated as a constrained nonnegative matrix factorization problem. Leveraging this insight, we propose a novel method for community detection and a theoretically well-grounded initialization strategy that provides an initial estimate of communities for inference algorithms. Our approach is agnostic to any specific network structure and applies to graphs with any structure representable by a DCBM, not only assortative ones. Experiments on synthetic and real benchmark networks show that our method detects communities comparable to those found by DCBM inference, while scaling linearly with the number of edges and communities; for instance, it processes a graph with 100,000 nodes and 2,000,000 edges in approximately 4 minutes. Moreover, the proposed initialization strategy significantly improves solution quality and reduces the number of iterations required by all tested inference algorithms. Overall, this work provides a scalable and robust framework for community detection and highlights the benefits of a matrix-factorization perspective for the DCBM.


Thinned random measures for sparse graphs with overlapping communities

Neural Information Processing Systems

Network models for exchangeable arrays, including most stochastic block models, generate dense graphs with a limited ability to capture many characteristics of real-world social and biological networks. A class of models based on completely random measures like the generalized gamma process (GGP) have recently addressed some of these limitations. We propose a framework for thinning edges from realizations of GGP random graphs that models observed links via nodes' overall propensity to interact, as well as the similarity of node memberships within a large set of latent communities. Our formulation allows us to learn the number of communities from data, and enables efficient Monte Carlo methods that scale linearly with the number of observed edges, and thus (unlike dense block models) sub-quadratically with the number of entities or nodes. We compare to alternative models for both dense and sparse networks, and demonstrate effective recovery of latent community structure for real-world networks with thousands of nodes.