Pan, Erlin
One Node One Model: Featuring the Missing-Half for Graph Clustering
Xie, Xuanting, Li, Bingheng, Pan, Erlin, Guo, Zhaochen, Kang, Zhao, Chen, Wenyu
Most existing graph clustering methods primarily focus on exploiting topological structure, often neglecting the ``missing-half" node feature information, especially how these features can enhance clustering performance. This issue is further compounded by the challenges associated with high-dimensional features. Feature selection in graph clustering is particularly difficult because it requires simultaneously discovering clusters and identifying the relevant features for these clusters. To address this gap, we introduce a novel paradigm called ``one node one model", which builds an exclusive model for each node and defines the node label as a combination of predictions for node groups. Specifically, the proposed ``Feature Personalized Graph Clustering (FPGC)" method identifies cluster-relevant features for each node using a squeeze-and-excitation block, integrating these features into each model to form the final representations. Additionally, the concept of feature cross is developed as a data augmentation technique to learn low-order feature interactions. Extensive experimental results demonstrate that FPGC outperforms state-of-the-art clustering methods. Moreover, the plug-and-play nature of our method provides a versatile solution to enhance GNN-based models from a feature perspective.
Provable Filter for Real-world Graph Clustering
Xie, Xuanting, Pan, Erlin, Kang, Zhao, Chen, Wenyu, Li, Bingheng
Graph clustering, an important unsupervised problem, has been shown to be more resistant to advances in Graph Neural Networks (GNNs). In addition, almost all clustering methods focus on homophilic graphs and ignore heterophily. This significantly limits their applicability in practice, since real-world graphs exhibit a structural disparity and cannot simply be classified as homophily and heterophily. Thus, a principled way to handle practical graphs is urgently needed. To fill this gap, we provide a novel solution with theoretical support. Interestingly, we find that most homophilic and heterophilic edges can be correctly identified on the basis of neighbor information. Motivated by this finding, we construct two graphs that are highly homophilic and heterophilic, respectively. They are used to build low-pass and high-pass filters to capture holistic information. Important features are further enhanced by the squeeze-and-excitation block. We validate our approach through extensive experiments on both homophilic and heterophilic graphs. Empirical results demonstrate the superiority of our method compared to state-of-the-art clustering methods.
CDC: A Simple Framework for Complex Data Clustering
Kang, Zhao, Xie, Xuanting, Li, Bingheng, Pan, Erlin
In today's data-driven digital era, the amount as well as complexity, such as multi-view, non-Euclidean, and multi-relational, of the collected data are growing exponentially or even faster. Clustering, which unsupervisely extracts valid knowledge from data, is extremely useful in practice. However, existing methods are independently developed to handle one particular challenge at the expense of the others. In this work, we propose a simple but effective framework for complex data clustering (CDC) that can efficiently process different types of data with linear complexity. We first utilize graph filtering to fuse geometry structure and attribute information. We then reduce the complexity with high-quality anchors that are adaptively learned via a novel similarity-preserving regularizer. We illustrate the cluster-ability of our proposed method theoretically and experimentally. In particular, we deploy CDC to graph data of size 111M.
PC-Conv: Unifying Homophily and Heterophily with Two-fold Filtering
Li, Bingheng, Pan, Erlin, Kang, Zhao
Recently, many carefully crafted graph representation learning methods have achieved impressive performance on either strong heterophilic or homophilic graphs, but not both. Therefore, they are incapable of generalizing well across real-world graphs with different levels of homophily. This is attributed to their neglect of homophily in heterophilic graphs, and vice versa. In this paper, we propose a two-fold filtering mechanism to extract homophily in heterophilic graphs and vice versa. In particular, we extend the graph heat equation to perform heterophilic aggregation of global information from a long distance. The resultant filter can be exactly approximated by the Possion-Charlier (PC) polynomials. To further exploit information at multiple orders, we introduce a powerful graph convolution PC-Conv and its instantiation PCNet for the node classification task. Compared with state-of-the-art GNNs, PCNet shows competitive performance on well-known homophilic and heterophilic graphs. Our implementation is available at https://github.com/uestclbh/PC-Conv.
Beyond Homophily: Reconstructing Structure for Graph-agnostic Clustering
Pan, Erlin, Kang, Zhao
However, they are designed on of connected nodes belong to different classes (Pei et al., the homophilic assumption of graph and clustering 2020; Xie et al., 2023). Traditional GNNs learn representations on heterophilic graph is overlooked. Due to via message passing mechanism under the assumption the lack of labels, it is impossible to first identify of homophily (Fang et al., 2022). Facing heterophilic graphs, a graph as homophilic or heterophilic before a previous approaches mainly suffer two limitations. On the suitable GNN model can be found. Hence, clustering one hand, the local neighbors in a graph are nodes that are on real-world graph with various levels of proximally located, while nodes that are semantically similar homophily poses a new challenge to the graph might be far apart on heterophilic graph (Zhu et al., research community. To fill this gap, we propose 2020). Thus, existing techniques fail to capture long-range a novel graph clustering method, which contains information from distant nodes. On the other hand, they three key components: graph reconstruction, don't distinguish similar and dissimilar neighbors, which a mixed filter, and dual graph clustering carry different amounts of information.
Multi-view Contrastive Graph Clustering
Pan, Erlin, Kang, Zhao
With the explosive growth of information technology, multi-view graph data have become increasingly prevalent and valuable. Most existing multi-view clustering techniques either focus on the scenario of multiple graphs or multi-view attributes. In this paper, we propose a generic framework to cluster multi-view attributed graph data. Specifically, inspired by the success of contrastive learning, we propose multi-view contrastive graph clustering (MCGC) method to learn a consensus graph since the original graph could be noisy or incomplete and is not directly applicable. Our method composes of two key steps: we first filter out the undesirable high-frequency noise while preserving the graph geometric features via graph filtering and obtain a smooth representation of nodes; we then learn a consensus graph regularized by graph contrastive loss. Results on several benchmark datasets show the superiority of our method with respect to state-of-the-art approaches. In particular, our simple approach outperforms existing deep learning-based methods.