Chen, Pin-Yu
Multilayer Spectral Graph Clustering via Convex Layer Aggregation
Chen, Pin-Yu, Hero, Alfred O. III
Multilayer graphs are commonly used for representing different relations between entities and handling heterogeneous data processing tasks. New challenges arise in multilayer graph clustering for assigning clusters to a common multilayer node set and for combining information from each layer. This paper presents a theoretical framework for multilayer spectral graph clustering of the nodes via convex layer aggregation. Under a novel multilayer signal plus noise model, we provide a phase transition analysis that establishes the existence of a critical value on the noise level that permits reliable cluster separation. The analysis also specifies analytical upper and lower bounds on the critical value, where the bounds become exact when the clusters have identical sizes. Numerical experiments on synthetic multilayer graphs are conducted to validate the phase transition analysis and study the effect of layer weights and noise levels on clustering reliability.
AMOS: An Automated Model Order Selection Algorithm for Spectral Graph Clustering
Chen, Pin-Yu, Gensollen, Thibaut, Hero, Alfred O. III
One of the longstanding problems in spectral graph clustering (SGC) is the so-called model order selection problem: automated selection of the correct number of clusters. This is equivalent to the problem of finding the number of connected components or communities in an undirected graph. In this paper, we propose AMOS, an automated model order selection algorithm for SGC. Based on a recent analysis of clustering reliability for SGC under the random interconnection model, AMOS works by incrementally increasing the number of clusters, estimating the quality of identified clusters, and providing a series of clustering reliability tests. Consequently, AMOS outputs clusters of minimal model order with statistical clustering reliability guarantees. Comparing to three other automated graph clustering methods on real-world datasets, AMOS shows superior performance in terms of multiple external and internal clustering metrics.
Incremental Method for Spectral Clustering of Increasing Orders
Chen, Pin-Yu, Zhang, Baichuan, Hasan, Mohammad Al, Hero, Alfred O.
The smallest eigenvalues and the associated eigenvectors (i.e., eigenpairs) of a graph Laplacian matrix have been widely used for spectral clustering and community detection. However, in real-life applications the number of clusters or communities (say, $K$) is generally unknown a-priori. Consequently, the majority of the existing methods either choose $K$ heuristically or they repeat the clustering method with different choices of $K$ and accept the best clustering result. The first option, more often, yields suboptimal result, while the second option is computationally expensive. In this work, we propose an incremental method for constructing the eigenspectrum of the graph Laplacian matrix. This method leverages the eigenstructure of graph Laplacian matrix to obtain the $K$-th eigenpairs of the Laplacian matrix given a collection of all the $K-1$ smallest eigenpairs. Our proposed method adapts the Laplacian matrix such that the batch eigenvalue decomposition problem transforms into an efficient sequential leading eigenpair computation problem. As a practical application, we consider user-guided spectral clustering. Specifically, we demonstrate that users can utilize the proposed incremental method for effective eigenpair computation and determining the desired number of clusters based on multiple clustering metrics.
Multi-centrality Graph Spectral Decompositions and their Application to Cyber Intrusion Detection
Chen, Pin-Yu, Choudhury, Sutanay, Hero, Alfred O.
Many modern datasets can be represented as graphs and hence spectral decompositions such as graph principal component analysis (PCA) can be useful. Distinct from previous graph decomposition approaches based on subspace projection of a single topological feature, e.g., the Fiedler vector of centered graph adjacency matrix (graph Laplacian), we propose spectral decomposition approaches to graph PCA and graph dictionary learning that integrate multiple features, including graph walk statistics, centrality measures and graph distances to reference nodes. In this paper we propose a new PCA method for single graph analysis, called multi-centrality graph PCA (MC-GPCA), and a new dictionary learning method for ensembles of graphs, called multi-centrality graph dictionary learning (MC-GDL), both based on spectral decomposition of multi-centrality matrices. As an application to cyber intrusion detection, MC-GPCA can be an effective indicator of anomalous connectivity pattern and MC-GDL can provide discriminative basis for attack classification.
Supervised Collective Classification for Crowdsourcing
Chen, Pin-Yu, Lien, Chia-Wei, Chu, Fu-Jen, Ting, Pai-Shun, Cheng, Shin-Ming
Crowdsourcing utilizes the wisdom of crowds for collective classification via information (e.g., labels of an item) provided by labelers. Current crowdsourcing algorithms are mainly unsupervised methods that are unaware of the quality of crowdsourced data. In this paper, we propose a supervised collective classification algorithm that aims to identify reliable labelers from the training data (e.g., items with known labels). The reliability (i.e., weighting factor) of each labeler is determined via a saddle point algorithm. The results on several crowdsourced data show that supervised methods can achieve better classification accuracy than unsupervised methods, and our proposed method outperforms other algorithms.