heritability
Counterfactual explainability of black-box prediction models
It is crucial to be able to explain black-box prediction models to use them effectively and safely in practice. Most existing tools for model explanations are associational rather than causal, and we use two paradoxical examples to show that such explanations are generally inadequate. Motivated by the concept of genetic heritability in twin studies, we propose a new notion called counterfactual explainability for black-box prediction models. Counterfactual explainability has three key advantages: (1) it leverages counterfactual outcomes and extends methods for global sensitivity analysis (such as functional analysis of variance and Sobol's indices) to a causal setting; (2) it is defined not only for the totality of a set of input factors but also for their interactions (indeed, it is a probability measure on a whole ``explanation algebra''); (3) it also applies to dependent input factors whose causal relationship can be modeled by a directed acyclic graph, thus incorporating causal mechanisms into the explanation.
Persistent Homological State-Space Estimation of Functional Human Brain Networks at Rest
Chung, Moo K., Huang, Shih-Gu, Carroll, Ian C., Calhoun, Vince D., Goldsmith, H. Hill
The paper introduces a new data-driven topological data analysis (TDA) method for studying dynamically changing human functional brain networks obtained from the resting-state functional magnetic resonance imaging (rs-fMRI). Leveraging persistent homology, a multiscale topological approach, we present a framework that incorporates the temporal dimension of brain network data. This allows for a more robust estimation of the topological features of dynamic brain networks. The method employs the Wasserstein distance to measure the topological differences between networks and demonstrates greater efficiency and performance than the commonly used -means clustering in defining the state spaces of dynamic brain networks. Our method maintains robust performance across different scales and is especially suited for dynamic brain networks. In addition to the methodological advancement, the paper applies the proposed technique to analyze the heritability of overall brain network topology using a twin study design. The study investigates whether the dynamic pattern of brain networks is a genetically influenced trait, an area previously underexplored. By examining the state change patterns in twin brain networks, we make significant strides in understanding the genetic factors underlying dynamic brain network features. Furthermore, the paper makes its method accessible by providing MATLAB codes, contributing to reproducibility and broader application.
Genetic prediction of quantitative traits: a machine learner's guide focused on height
Bourguignon, Lucie, Weis, Caroline, Jutzeler, Catherine R., Adamer, Michael
Machine learning and deep learning have been celebrating many successes in the application to biological problems, especially in the domain of protein folding. Another equally complex and important question has received relatively little attention by the machine learning community, namely the one of prediction of complex traits from genetics. Tackling this problem requires in-depth knowledge of the related genetics literature and awareness of various subtleties associated with genetic data. In this guide, we provide an overview for the machine learning community on current state of the art models and associated subtleties which need to be taken into consideration when developing new models for phenotype prediction. We use height as an example of a continuous-valued phenotype and provide an introduction to benchmark datasets, confounders, feature selection, and common metrics.
Common genetic variation influencing human white matter microstructure
The white matter of the brain, which is composed of axonal tracts connecting different brain regions, plays key roles in both normal brain function and a variety of neurological disorders. Zhao et al. combined detailed magnetic resonance imagingโbased assessment of brain structures with genetic data on nearly 44,000 individuals (see the Perspective by Filley). On the basis of this comprehensive analysis, the authors identified structural and genetic abnormalities associated with neurological and psychiatric disorders, as well as some nondisease traits, thus creating a valuable resource and providing some insights into the underlying neurobiology. Science , abf3736, this issue p. [eabf3736][1]; see also abj1881, p. [1265][2] ### INTRODUCTION White matter in the human brain serves a critical role in organizing distributed neural networks. Diffusion magnetic resonance imaging (dMRI) has enabled the study of white matter in vivo, showing that interindividual variations in white matter microstructure are associated with a wide variety of clinical outcomes. Although white matter differences in general population cohorts are known to be heritable, few common genetic variants influencing white matter microstructure have been identified. ### RATIONALE To identify genetic variants influencing white matter microstructure, we conducted a genome-wide association study (GWAS) of dMRI data from 43,802 individuals across five data resources. We analyzed five major diffusion tensor imaging (DTI) modelโderived parameters along 21 cerebral white matter tracts. ### RESULTS In the discovery GWAS with 34,024 individuals of British ancestry, we replicated 42 of the 44 genomic regions discovered in the largest previous GWAS and identified 109 additional regions associated with white matter microstructure ( P < 2.3 ร 10โ10, adjusted for the number of phenotypes studied). These results indicate strong polygenic influences on white matter microstructure. Of the 151 regions, 52 passed the Bonferroni significance level ( P < 5 ร 10โ5) in our analysis of nine independent validation datasets, including four with subjects of non-European ancestry. On average, common genetic variants explained 41% (standard error = 2%) of the variation in white matter microstructure. The 151 identified genomic regions can explain 32.3% of heritability for white matter microstructure, whereas the 44 previously identified genomic regions can only explain 11.7% of heritability. As a biological validation of our GWAS findings, we observed heritability enrichment within regulatory elements active in oligodendrocytes and other glia, whereas no enrichment was observed in neurons. These results are expected and suggest that genetic variation leads to changes in white matter microstructure by affecting gene regulation in glia. We observed genetic correlations and colocalizations of white matter microstructure with a wide range of brain-related complex traits and diseases, such as cognitive functions, cardiovascular risk factors, as well as various neurological and psychiatric diseases. For example, of the 25 reported genetic risk regions of glioma, 11 were also associated with white matter microstructure, which illustrates the close genetic relationship between glioma and white matter integrity. Additionally, we found that 14 white matter microstructureโassociated genes ( P < 1.2 ร 10โ8) were targets for 79 commonly used nervous system drugs, such as antipsychotics, antidepressants, anticonvulsants, and drugs for Parkinsonโs disease and dementia. ### CONCLUSION This large-scale study of dMRI scans from 43,802 subjects improved our understanding of the highly polygenic genetic architecture of human brain white matter tracts. We identified 151 genomic regions associated with white matter microstructure. The GWAS findings were supported by enrichments within cell types that make up white matter microstructure. Moreover, we uncovered genetic relationships between white matter and various clinical endpoints, such as stroke, major depressive disorder, schizophrenia, and attention deficit hyperactivity disorder. The targets of many drugs commonly used for disabling cognitive disorders have genetic associations with white matter, which suggests that the neuropharmacology of many disorders can potentially be improved by studying how these medications work in the brain white matter. ![Figure][3] Identifying genetic variants influencing human brain white matter microstructure. (Top left) Quantifying the microstructure in white matter tracts using DTI models. (Bottom left) Genomic locations of common genetic variants associated with white matter microstructure. (Top right) Selected genetic correlations between white matter microstructure and brain disorders (stroke and major depressive disorder). (Bottom right) Partitioned heritability enrichment analysis in brain cell types. FDR, false discovery rate. Brain regions communicate with each other through tracts of myelinated axons, commonly referred to as white matter. We identified common genetic variants influencing white matter microstructure using diffusion magnetic resonance imaging of 43,802 individuals. Genome-wide association analysis identified 109 associated loci, 30 of which were detected by tract-specific functional principal components analysis. A number of loci colocalized with brain diseases, such as glioma and stroke. Genetic correlations were observed between white matter microstructure and 57 complex traits and diseases. Common variants associated with white matter microstructure altered the function of regulatory elements in glial cells, particularly oligodendrocytes. This large-scale tract-specific study advances the understanding of the genetic architecture of white matter and its genetic links to a wide spectrum of clinical outcomes. [1]: /lookup/doi/10.1126/science.abf3736 [2]: /lookup/doi/10.1126/science.abj1881 [3]: pending:yes
Accurate Genomic Prediction Of Human Height
Lello, Louis, Avery, Steven G., Tellier, Laurent, Vazquez, Ana, Campos, Gustavo de los, Hsu, Stephen D. H.
We construct genomic predictors for heritable and extremely complex human quantitative traits (height, heel bone density, and educational attainment) using modern methods in high dimensional statistics (i.e., machine learning). Replication tests show that these predictors capture, respectively, $\sim$40, 20, and 9 percent of total variance for the three traits. For example, predicted heights correlate $\sim$0.65 with actual height; actual heights of most individuals in validation samples are within a few cm of the prediction. The variance captured for height is comparable to the estimated SNP heritability from GCTA (GREML) analysis, and seems to be close to its asymptotic value (i.e., as sample size goes to infinity), suggesting that we have captured most of the heritability for the SNPs used. Thus, our results resolve the common SNP portion of the "missing heritability" problem -- i.e., the gap between prediction R-squared and SNP heritability. The $\sim$20k activated SNPs in our height predictor reveal the genetic architecture of human height, at least for common SNPs. Our primary dataset is the UK Biobank cohort, comprised of almost 500k individual genotypes with multiple phenotypes. We also use other datasets and SNPs found in earlier GWAS for out-of-sample validation of our results.
Face map shows the features you're likely to inherit
You're more likely to have your mother's cheekbones than her eyes, new research suggests. Researchers studied the facial features of 1,000 female twins to find parts of the face that are likely to be controlled by genetics. They used their results to create interactive face maps that reveal the features you're most likely to inherit from your parents. Biological traits such as facial features are influenced by genes and'environmental' factors including the socioeconomic conditions a person grew up in. Professor Giovanni Montana, from King's College London, said: 'The notion that our genes control our face is self-evident.
Mapping Heritability of Large-Scale Brain Networks with a Billion Connections {\em via} Persistent Homology
Chung, Moo K., Vilalta-Gil, Victoria, Rathouz, Paul J., Lahey, Benjamin B., Zald, David H.
In many human brain network studies, we do not have sufficient number (n) of images relative to the number (p) of voxels due to the prohibitively expensive cost of scanning enough subjects. Thus, brain network models usually suffer the small-n large-p problem. Such a problem is often remedied by sparse network models, which are usually solved numerically by optimizing L1-penalties. Unfortunately, due to the computational bottleneck associated with optimizing L1-penalties, it is not practical to apply such methods to construct large-scale brain networks at the voxel-level. In this paper, we propose a new scalable sparse network model using cross-correlations that bypass the computational bottleneck. Our model can build sparse brain networks at the voxel level with p > 25000. Instead of using a single sparse parameter that may not be optimal in other studies and datasets, the computational speed gain enables us to analyze the collection of networks at every possible sparse parameter in a coherent mathematical framework via persistent homology. The method is subsequently applied in determining the extent of heritability on a functional brain network at the voxel-level for the first time using twin fMRI.
Locally epistatic genomic relationship matrices for genomic association, prediction and selection
As the amount and complexity of genetic information increases it is necessary that we explore some efficient ways of handling these data. This study takes the "divide and conquer" approach for analyzing high dimensional genomic data. Our aims include reducing the dimensionality of the problem that has to be dealt one at a time, improving the performance and interpretability of the models. We propose using the inherent structures in the genome; to divide the bigger problem into manageable parts. In plant and animal breeding studies a distinction is made between the commercial value (additive + epistatic genetic effects) and the breeding value (additive genetic effects) of an individual since it is expected that some of the epistatic genetic effects will be lost due to recombination. In this paper, we argue that the breeder can take advantage of some of the epistatic marker effects in regions of low recombination. The models introduced here aim to estimate local epistatic line heritability by using the genetic map information and combine the local additive and epistatic effects. To this end, we have used semi-parametric mixed models with multiple local genomic relationship matrices with hierarchical testing designs and lasso post-processing for sparsity in the final model and speed. Our models produce good predictive performance along with genetic association information.