Genetic heterogeneity analysis using genetic algorithm and network science
Sha, Zhendong, Chen, Yuanzhu, Hu, Ting
–arXiv.org Artificial Intelligence
Through genome-wide association studies (GWAS), disease susceptible genetic variables can be identified by comparing the genetic data of individuals with and without a specific disease. However, the discovery of these associations poses a significant challenge due to genetic heterogeneity and feature interactions. Genetic variables intertwined with these effects often exhibit lower effect-size, and thus can be difficult to be detected using machine learning feature selection methods. To address these challenges, this paper introduces a novel feature selection mechanism for GWAS, named Feature Co-selection Network (FCSNet). FCS-Net is designed to extract heterogeneous subsets of genetic variables from a network constructed from multiple independent feature selection runs based on a genetic algorithm (GA), an evolutionary learning algorithm. We employ a non-linear machine learning algorithm to detect feature interaction. We introduce the Community Risk Score (CRS), a synthetic feature designed to quantify the collective disease association of each variable subset. Our experiment showcases the effectiveness of the utilized GA-based feature selection method in identifying feature interactions through synthetic data analysis. Furthermore, we apply our novel approach to a case-control colorectal cancer GWAS dataset. The resulting synthetic features are then used to explain the genetic heterogeneity in an additional case-only GWAS dataset.
arXiv.org Artificial Intelligence
Aug-11-2023
- Country:
- North America > United States > New York > New York County > New York City (0.14)
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (1.00)
- Research Report
- Industry:
- Technology:
- Information Technology > Artificial Intelligence > Machine Learning
- Evolutionary Systems (1.00)
- Neural Networks (1.00)
- Statistical Learning
- Clustering (0.68)
- Regression (0.50)
- Information Technology > Artificial Intelligence > Machine Learning