Multi-view Banded Spectral Clustering with application to ICD9 clustering

Zhang, Luwan, Liao, Katherine, Kohane, Issac, Cai, Tianxi

arXiv.org Machine Learning 

Despite recent development in methodology, community detection remains a challenging problem. Existing literature largely focuses on the standard setting where a network is learned using an observed adjacency matrix from a single data source. Constructing a shared network from multiple data sources is more challenging due to the heterogeneity across populations. Additionally, when a natural ordering on the nodes of interest arises, no existing method takes such information into account. Motivated by grouping the International classification of diseases, ninth revision, (ICD9) codes to represent clinically meaningful phenotypes, we propose a novel spectral clustering method that optimally combines multiple data sources while leveraging the prior ordering knowledge. The proposed method combines a banding step to encourage a desired moving average structure with a subsequent weighting step to maximize consensus across multiple sources. Its statistical performance is thoroughly studied under a multi-view stochastic block model. We also provide a simple rule of choosing weights in practice. The efficacy and robustness of the method is fully demonstrated through extensive simulations. Finally, we apply the method to the ICD9 coding system and yield a very insightful clustering structure by integrating information from a large claim database and two healthcare systems.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found