row cluster
Distributed MCMC inference for Bayesian Non-Parametric Latent Block Model
Khoufache, Reda, Belhadj, Anisse, Azzag, Hanene, Lebbah, Mustapha
Given a data matrix, where rows represent observations and columns represent variables or features, co-clustering, also known as bi-clustering aims to infer a row partition and a column partition simultaneously. The resulting partition is composed of homogeneous blocks. When a dataset exhibits a dual structure between observations and variables, co-clustering outperforms conventional clustering algorithms which only infers a row partition without considering the relationships between observations and variables. Co-clustering is a powerful data mining tool for two-dimensional data and is widely applied in various fields such as bioinformatics [1]. To tackle the co-clustering problem, the Latent Block Model (LBM) was introduced by [2].
- Europe > France > Île-de-France > Yvelines > Versailles (0.05)
- Europe > France > Île-de-France > Paris > Paris (0.04)
Visualizing Overlapping Biclusterings and Boolean Matrix Factorizations
Marette, Thibault, Miettinen, Pauli, Neumann, Stefan
Finding (bi-)clusters in bipartite graphs is a popular data analysis approach. Analysts typically want to visualize the clusters, which is simple as long as the clusters are disjoint. However, many modern algorithms find overlapping clusters, making visualization more complicated. In this paper, we study the problem of visualizing \emph{a given clustering} of overlapping clusters in bipartite graphs and the related problem of visualizing Boolean Matrix Factorizations. We conceptualize three different objectives that any good visualization should satisfy: (1) proximity of cluster elements, (2) large consecutive areas of elements from the same cluster, and (3) large uninterrupted areas in the visualization, regardless of the cluster membership. We provide objective functions that capture these goals and algorithms that optimize these objective functions. Interestingly, in experiments on real-world datasets, we find that the best trade-off between these competing goals is achieved by a novel heuristic, which locally aims to place rows and columns with similar cluster membership next to each other.
- North America > United States > Pennsylvania (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- Europe > Finland > Uusimaa > Helsinki (0.04)
- Europe > Finland > Northern Savo > Kuopio (0.04)
Consistency of spectral clustering for directed network community detection
Directed networks appear in various areas, such as biology, sociology, physiology and computer science. However, at present, most network analysis ignores the direction. In this paper, we construct a spectral clustering method based on the singular decomposition of the adjacency matrix to detect community in directed stochastic block model (DiSBM). By considering a sparsity parameter, under some mild conditions, we show the proposed approach can consistently recover hidden row and column communities for different scaling of degrees. By considering the degree heterogeneity of both row and column nodes, we further establish a theoretical framework for directed degree corrected stochastic block model (DiDCSBM). We show that the spectral clustering method stably yields consistent community detection for row clusters and column clusters under mild constraints on the degree heterogeneity. Our theoretical results under DiSBM and DiDCSBM provide some innovations on some special directed networks, such as directed network with balanced clusters, directed network with nodes enjoying similar degrees, and the directed Erd\"os-R\'enyi graph. Furthermore, our theoretical results under DiDCSBM are consistent with those under DiSBM when DiDCSBM degenerates to DiSBM.
- North America > United States (0.14)
- Asia > China (0.04)
Bipartite mixed membership stochastic blockmodel
Mixed membership problem for undirected network has been well studied in network analysis recent years. However, the more general case of mixed membership for directed network remains a challenge. Here, we propose an interpretable model: bipartite mixed membership stochastic blockmodel (BiMMSB for short) for directed mixed membership networks. BiMMSB allows that row nodes and column nodes of the adjacency matrix can be different and these nodes may have distinct community structure in a directed network. We also develop an efficient spectral algorithm called BiMPCA to estimate the mixed memberships for both row nodes and column nodes in a directed network. We show that the approach is asymptotically consistent under BiMMSB. We demonstrate the advantages of BiMMSB with applications to a small-scale simulation study, the directed Political blogs network and the Papers Citations network.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Asia > China > Tianjin Province > Tianjin (0.04)
- Asia > China > Jiangsu Province > Xuzhou (0.04)
Randomized spectral co-clustering for large-scale directed networks
Guo, Xiao, Qiu, Yixuan, Zhang, Hai, Chang, Xiangyu
Directed networks are generally used to represent asymmetric relationships among units. Co-clustering aims to cluster the senders and receivers of directed networks simultaneously. In particular, the well-known spectral clustering algorithm could be modified as the spectral co-clustering to co-cluster directed networks. However, large-scale networks pose computational challenge to it. In this paper, we leverage randomized sketching techniques to accelerate the spectral co-clustering algorithms in order to co-cluster large-scale directed networks more efficiently. Specifically, we derive two series of randomized spectral co-clustering algorithms, one is random-projection-based and the other is random-samplingbased. Theoretically, we analyze the resulting algorithms under two generative models-the stochastic co-block model and the degree corrected stochastic co-block model. The approximation error rates and misclustering error rates of proposed two randomized spectral co-clustering algorithms are established, which indicate better bounds than the state-ofthe-art results of co-clustering literature. Numerically, we conduct simulations to support our theoretical results and test the efficiency of the algorithms on real networks with up to tens of millions of nodes. In order to use the proposed algorithms more conveniently, a new R package called RandClust is developed and made available to the public.
- North America > United States (0.46)
- Asia > China > Shaanxi Province > Xi'an (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)