co-clustering
Learning A Structured Optimal Bipartite Graph for Co-Clustering
Co-clustering methods have been widely applied to document clustering and gene expression analysis. These methods make use of the duality between features and samples such that the co-occurring structure of sample and feature clusters can be extracted. In graph based co-clustering methods, a bipartite graph is constructed to depict the relation between features and samples. Most existing co-clustering methods conduct clustering on the graph achieved from the original data matrix, which doesn't have explicit cluster structure, thus they require a post-processing step to obtain the clustering results. In this paper, we propose a novel co-clustering method to learn a bipartite graph with exactly k connected components, where k is the number of clusters. The new bipartite graph learned in our model approximates the original graph but maintains an explicit cluster structure, from which we can immediately get the clustering results without post-processing. Extensive empirical results are presented to verify the effectiveness and robustness of our model.
Scalable Robust Bayesian Co-Clustering with Compositional ELBOs
Vinod, Ashwin, Bajaj, Chandrajit
Co-clustering exploits the duality of instances and features to simultaneously uncover meaningful groups in both dimensions, often outperforming traditional clustering in high-dimensional or sparse data settings. Although recent deep learning approaches successfully integrate feature learning and cluster assignment, they remain susceptible to noise and can suffer from posterior collapse within standard autoencoders. In this paper, we present the first fully variational Co-clustering framework that directly learns row and column clusters in the latent space, leveraging a doubly reparameterized ELBO to improve gradient signal-to-noise separation. Our unsupervised model integrates a Variational Deep Embedding with a Gaussian Mixture Model (GMM) prior for both instances and features, providing a built-in clustering mechanism that naturally aligns latent modes with row and column clusters. Furthermore, our regularized end-to-end noise learning Compositional ELBO architecture jointly reconstructs the data while regularizing against noise through the KL divergence, thus gracefully handling corrupted or missing inputs in a single training pipeline. To counteract posterior collapse, we introduce a scale modification that increases the encoder's latent means only in the reconstruction pathway, preserving richer latent representations without inflating the KL term. Finally, a mutual information-based cross-loss ensures coherent co-clustering of rows and columns. Empirical results on diverse real-world datasets from multiple modalities, numerical, textual, and image-based, demonstrate that our method not only preserves the advantages of prior Co-clustering approaches but also exceeds them in accuracy and robustness, particularly in high-dimensional or noisy settings.
- North America > United States > Texas > Travis County > Austin (0.14)
- North America > United States > Wisconsin (0.04)
- Europe > Germany > Bavaria > Regensburg (0.04)
- Health & Medicine > Therapeutic Area > Musculoskeletal (1.00)
- Health & Medicine > Therapeutic Area > Neurology > Parkinson's Disease (0.94)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.93)
Co-clustering for Federated Recommender System
He, Xinrui, Liu, Shuo, Keung, Jackey, He, Jingrui
As data privacy and security attract increasing attention, Federated Recommender System (FRS) offers a solution that strikes a balance between providing high-quality recommendations and preserving user privacy. However, the presence of statistical heterogeneity in FRS, commonly observed due to personalized decision-making patterns, can pose challenges. To address this issue and maximize the benefit of collaborative filtering (CF) in FRS, it is intuitive to consider clustering clients (users) as well as items into different groups and learning group-specific models. Existing methods either resort to client clustering via user representations-risking privacy leakage, or employ classical clustering strategies on item embeddings or gradients, which we found are plagued by the curse of dimensionality. In this paper, we delve into the inefficiencies of the K-Means method in client grouping, attributing failures due to the high dimensionality as well as data sparsity occurring in FRS, and propose CoFedRec, a novel Co-clustering Federated Recommendation mechanism, to address clients heterogeneity and enhance the collaborative filtering within the federated framework. Specifically, the server initially formulates an item membership from the client-provided item networks. Subsequently, clients are grouped regarding a specific item category picked from the item membership during each communication round, resulting in an intelligently aggregated group model. Meanwhile, to comprehensively capture the global inter-relationships among items, we incorporate an additional supervised contrastive learning term based on the server-side generated item membership into the local training phase for each client. Extensive experiments on four datasets are provided, which verify the effectiveness of the proposed CoFedRec.
- Asia > Singapore > Central Region > Singapore (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- Asia > China > Hong Kong (0.04)
- (2 more...)
Reviews: Learning A Structured Optimal Bipartite Graph for Co-Clustering
The authors propose a new method for co-clustering. The idea is to learn a bipartite graph with exactly k connected components. This way, the clusters can be directly inferred and no further preprocessing step (like executing k-means) is necessary. After introducing their approach the authors conduct experiments on a synthetic data set as well as on four benchmark data sets. I think that the proposed approach is interesting. However, there are some issues.
Scalable Co-Clustering for Large-Scale Data through Dynamic Partitioning and Hierarchical Merging
Wu, Zihan, Huang, Zhaoke, Yan, Hong
Co-clustering simultaneously clusters rows and columns, revealing more fine-grained groups. However, existing co-clustering methods suffer from poor scalability and cannot handle large-scale data. This paper presents a novel and scalable co-clustering method designed to uncover intricate patterns in high-dimensional, large-scale datasets. Specifically, we first propose a large matrix partitioning algorithm that partitions a large matrix into smaller submatrices, enabling parallel co-clustering. This method employs a probabilistic model to optimize the configuration of submatrices, balancing the computational efficiency and depth of analysis. Additionally, we propose a hierarchical co-cluster merging algorithm that efficiently identifies and merges co-clusters from these submatrices, enhancing the robustness and reliability of the process. Extensive evaluations validate the effectiveness and efficiency of our method. Experimental results demonstrate a significant reduction in computation time, with an approximate 83% decrease for dense matrices and up to 30% for sparse matrices.
Co-clustering based exploratory analysis of mixed-type data tables
Bouchareb, Aichetou, Boullé, Marc, Clérot, Fabrice, Rossi, Fabrice
Co-clustering is a class of unsupervised data analysis techniques that extract the existing underlying dependency structure between the instances and variables of a data table as homogeneous blocks. Most of those techniques are limited to variables of the same type. In this paper, we propose a mixed data co-clustering method based on a two-step methodology. In the first step, all the variables are binarized according to a number of bins chosen by the analyst, by equal frequency discretization in the numerical case, or keeping the most frequent values in the categorical case. The second step applies a co-clustering to the instances and the binary variables, leading to groups of instances and groups of variable parts. We apply this methodology on several data sets and compare with the results of a Multiple Correspondence Analysis applied to the same data.
- Europe > France > Île-de-France > Paris > Paris (0.14)
- Europe > Germany (0.04)
- Asia > Philippines (0.04)
- (8 more...)
Learning A Structured Optimal Bipartite Graph for Co-Clustering
Nie, Feiping, Wang, Xiaoqian, Deng, Cheng, Huang, Heng
Co-clustering methods have been widely applied to document clustering and gene expression analysis. These methods make use of the duality between features and samples such that the co-occurring structure of sample and feature clusters can be extracted. In graph based co-clustering methods, a bipartite graph is constructed to depict the relation between features and samples. Most existing co-clustering methods conduct clustering on the graph achieved from the original data matrix, which doesn't have explicit cluster structure, thus they require a post-processing step to obtain the clustering results. In this paper, we propose a novel co-clustering method to learn a bipartite graph with exactly k connected components, where k is the number of clusters.
Robust Continuous Co-Clustering
He, Xiao, Moreira-Matias, Luis
Clustering consists of grouping together samples giving their similar properties. The problem of modeling simultaneously groups of samples and features is known as Co-Clustering. This paper introduces ROCCO - a Robust Continuous Co-Clustering algorithm. ROCCO is a scalable, hyperparameter-free, easy and ready to use algorithm to address Co-Clustering problems in practice over massive cross-domain datasets. It operates by learning a graph-based two-sided representation of the input matrix. The underlying proposed optimization problem is non-convex, which assures a flexible pool of solutions. Moreover, we prove that it can be solved with a near linear time complexity on the input size. An exhaustive large-scale experimental testbed conducted with both synthetic and real-world datasets demonstrates ROCCO's properties in practice: (i) State-of-the-art performance in cross-domain real-world problems including Biomedicine and Text Mining; (ii) very low sensitivity to hyperparameter settings; (iii) robustness to noise and (iv) a linear empirical scalability in practice. These results highlight ROCCO as a powerful general-purpose co-clustering algorithm for cross-domain practitioners, regardless of their technical background.
- Asia (0.04)
- Europe > Sweden > Östergötland County > Linköping (0.04)
Co-Clustering Can Provide Industrial Data Pattern Discovery
In spite of the rapid development in data acquisition technology resulting in the explosive collection of acquired datasets, techniques such as data organization and classification, manipulation, and analysis of very large, diverse, heterogeneous datasets have only evolved modestly. This has led to hindrances in effective utility and better understanding of the acquired, large-scale data for knowledge discovery. In an industrial setting, an interesting visual from McKinsey illustrates that despite collecting data from tens of thousands of sensors, less than 1% is actually utilized. Data clustering is the classification of data objects into different groups (clusters) such that data objects in one group are similar together and dissimilar from another group. Typically, homogeneous data objects, i.e. data objects having the same data type, are grouped together using some of the well-known clustering algorithms.
Co-Clustering Can Provide Industrial Data Pattern Discovery
In spite of the rapid development in data acquisition technology resulting in the explosive collection of acquired datasets, techniques such as data organization and classification, manipulation, and analysis of very large, diverse, heterogeneous datasets have only evolved modestly. This has led to hindrances in effective utility and better understanding of the acquired, large-scale data for knowledge discovery. In an industrial setting, an interesting visual from McKinsey illustrates that despite collecting data from tens of thousands of sensors, less than 1% is actually utilized. Data clustering is the classification of data objects into different groups (clusters) such that data objects in one group are similar together and dissimilar from another group. Typically, homogeneous data objects, i.e. data objects having the same data type, are grouped together using some of the well-known clustering algorithms.