Selective Inference for Hierarchical Clustering

Gao, Lucy L., Bien, Jacob, Witten, Daniela

Dec-4-2020–arXiv.org Machine Learning

Testing for a difference in means between two groups is fundamental to answering research questions across virtually every scientific area. Classical tests control the Type I error rate when the groups are defined a priori. However, when the groups are instead defined via a clustering algorithm, then applying a classical test for a difference in means between the groups yields an extremely inflated Type I error rate. Notably, this problem persists even if two separate and independent data sets are used to define the groups and to test for a difference in their means. To address this problem, in this paper, we propose a selective inference approach to test for a difference in means between two clusters obtained from any clustering method. Our procedure controls the selective Type I error rate by accounting for the fact that the null hypothesis was generated from the data. We describe how to efficiently compute exact p-values for clusters obtained using agglomerative hierarchical clustering with many commonly used linkages. We apply our method to simulated data and to single-cell RNA-seq data.

euclidean distance, linkage, true cluster, (15 more...)

arXiv.org Machine Learning

Dec-4-2020

arXiv.org PDF

Add feedback

Country:
- Antarctica (0.04)
- North America > United States
  - California (0.14)

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (0.67)

Industry:
- Health & Medicine (0.68)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found