A Generalized Similarity U Test for Multivariate Analysis of Sequencing Data
–arXiv.org Artificial Intelligence
Summary: Sequencing-based studies are emerging as a major tool for genetic association studies of complex diseases. These studies pose great challenges to the traditional statistical methods because of the high-dimensionality of data and the low frequency of genetic variants. Moreover, there is a great interest in biology and epidemiology to identify genetic risk factors contributed to multiple disease phenotypes. The multiple phenotypes can often follow different distributions, which brings an additional challenge to the current statistical framework. In this paper, we propose a generalized similarity U test, referred to as GSU. GSU is a similarity-based test that can handle high-dimensional genotypes and phenotypes. We studied the theoretical properties of GSU, and provided the efficient p-value calculation for association test as well as the sample size and power calculation for the study design. Through simulation, we found that GSU had advantages over existing methods in terms of power and robustness to phenotype distributions. Finally, we used GSU to perform a multivariate analysis of sequencing data in the Dallas Heart Study and identified a joint association of 4 genes with 5 metabolic related phenotypes. Key words: Weighted U Statistic; Sequencing Study; Non-parametric Statistics. 1. Introduction Genome-wide association studies (GW AS) have made substantial progress in discovering common genetic variants associated with complex diseases. Despite such success, a large proportion of heritability of complex diseases remains unexplained.
arXiv.org Artificial Intelligence
Aug-18-2025
- Country:
- North America > United States
- Michigan > Ingham County
- East Lansing (0.04)
- Lansing (0.04)
- Texas > Tarrant County
- Fort Worth (0.04)
- Michigan > Ingham County
- North America > United States
- Genre:
- Research Report > Experimental Study (1.00)
- Industry:
- Technology: