Goto

Collaborating Authors

 gene network


Structure Learning with Side Information: Sample Complexity

Neural Information Processing Systems

The vertices represent the RVs, and the edges signify the conditional dependencies among the RVs. Structure learning is the process of inferring the edges by observing realizations of the RVs, and it has applications in a wide range of technological, social, and biological networks. Learning the structure of graphs when the vertices are treated in isolation from inferential information known about them is well-investigated. In a wide range of domains, however, often there exist additional inferred knowledge about the structure, which can serve as valuable side information. For instance, the gene networks that represent different subtypes of the same cancer share similar edges across all subtypes and also have exclusive edges corresponding to each subtype, rendering partially similar graphical models for gene expression in different cancer subtypes.


A scalable gene network model of regulatory dynamics in single cells

Bertin, Paul, Viviano, Joseph D., Tejada-Lapuerta, Alejandro, Wang, Weixu, Bauer, Stefan, Theis, Fabian J., Bengio, Yoshua

arXiv.org Artificial Intelligence

Single-cell data provide high-dimensional measurements of the transcriptional states of cells, but extracting insights into the regulatory functions of genes, particularly identifying transcriptional mechanisms affected by biological perturbations, remains a challenge. Many perturbations induce compensatory cellular responses, making it difficult to distinguish direct from indirect effects on gene regulation. Modeling how gene regulatory functions shape the temporal dynamics of these responses is key to improving our understanding of biological perturbations. Dynamical models based on differential equations offer a principled way to capture transcriptional dynamics, but their application to single-cell data has been hindered by computational constraints, stochasticity, sparsity, and noise. Existing methods either rely on low-dimensional representations or make strong simplifying assumptions, limiting their ability to model transcriptional dynamics at scale. We introduce a Functional and Learnable model of Cell dynamicS, FLeCS, that incorporates gene network structure into coupled differential equations to model gene regulatory functions. Given (pseudo)time-series single-cell data, FLeCS accurately infers cell dynamics at scale, provides improved functional insights into transcriptional mechanisms perturbed by gene knockouts, both in myeloid differentiation and K562 Perturb-seq experiments, and simulates single-cell trajectories of A549 cells following small-molecule perturbations.


scBIT: Integrating Single-cell Transcriptomic Data into fMRI-based Prediction for Alzheimer's Disease Diagnosis

Huang, Yu-An, Hu, Yao, Li, Yue-Chao, Cao, Xiyue, Li, Xinyuan, Tan, Kay Chen, You, Zhu-Hong, Huang, Zhi-An

arXiv.org Artificial Intelligence

Functional MRI (fMRI) and single-cell transcriptomics are pivotal in Alzheimer's disease (AD) research, each providing unique insights into neural function and molecular mechanisms. However, integrating these complementary modalities remains largely unexplored. Here, we introduce scBIT, a novel method for enhancing AD prediction by combining fMRI with single-nucleus RNA (snRNA). scBIT leverages snRNA as an auxiliary modality, significantly improving fMRI-based prediction models and providing comprehensive interpretability. It employs a sampling strategy to segment snRNA data into cell-type-specific gene networks and utilizes a self-explainable graph neural network to extract critical subgraphs. Additionally, we use demographic and genetic similarities to pair snRNA and fMRI data across individuals, enabling robust cross-modal learning. Extensive experiments validate scBIT's effectiveness in revealing intricate brain region-gene associations and enhancing diagnostic prediction accuracy. By advancing brain imaging transcriptomics to the single-cell level, scBIT sheds new light on biomarker discovery in AD research. Experimental results show that incorporating snRNA data into the scBIT model significantly boosts accuracy, improving binary classification by 3.39% and five-class classification by 26.59%. The codes were implemented in Python and have been released on GitHub (https://github.com/77YQ77/scBIT) and Zenodo (https://zenodo.org/records/11599030) with detailed instructions.


GeSubNet: Gene Interaction Inference for Disease Subtype Network Generation

Yang, Ziwei, Chen, Zheng, Liu, Xin, Kotoge, Rikuto, Chen, Peng, Matsubara, Yasuko, Sakurai, Yasushi, Sun, Jimeng

arXiv.org Artificial Intelligence

Retrieving gene functional networks from knowledge databases presents a challenge due to the mismatch between disease networks and subtype-specific variations. Current solutions, including statistical and deep learning methods, often fail to effectively integrate gene interaction knowledge from databases or explicitly learn subtype-specific interactions. To address this mismatch, we propose GeSubNet, which learns a unified representation capable of predicting gene interactions while distinguishing between different disease subtypes. Graphs generated by such representations can be considered subtype-specific networks. GeSubNet is a multi-step representation learning framework with three modules: First, a deep generative model learns distinct disease subtypes from patient gene expression profiles. Second, a graph neural network captures representations of prior gene networks from knowledge databases, ensuring accurate physical gene interactions. Finally, we integrate these two representations using an inference loss that leverages graph generation capabilities, conditioned on the patient separation loss, to refine subtype-specific information in the learned representation. GeSubNet consistently outperforms traditional methods, with average improvements of 30.6%, 21.0%, 20.1%, and 56.6% across four graph evaluation metrics, averaged over four cancer datasets. Particularly, we conduct a biological simulation experiment to assess how the behavior of selected genes from over 11,000 candidates affects subtypes or patient distributions. The results show that the generated network has the potential to identify subtype-specific genes with an 83% likelihood of impacting patient distribution shifts. The GeSubNet resource is available: https://anonymous.4open.science/r/GeSubNet/


Structure Learning with Side Information: Sample Complexity

Neural Information Processing Systems

The vertices represent the RVs, and the edges signify the conditional dependencies among the RVs. Structure learning is the process of inferring the edges by observing realizations of the RVs, and it has applications in a wide range of technological, social, and biological networks. Learning the structure of graphs when the vertices are treated in isolation from inferential information known about them is well-investigated. In a wide range of domains, however, often there exist additional inferred knowledge about the structure, which can serve as valuable side information. For instance, the gene networks that represent different subtypes of the same cancer share similar edges across all subtypes and also have exclusive edges corresponding to each subtype, rendering partially similar graphical models for gene expression in different cancer subtypes.


Engineering morphogenesis of cell clusters with differentiable programming

Deshpande, Ramya, Mottes, Francesco, Vlad, Ariana-Dalia, Brenner, Michael P., Co, Alma dal

arXiv.org Artificial Intelligence

Understanding the rules underlying organismal development is a major unsolved problem in biology. Each cell in a developing organism responds to signals in its local environment by dividing, excreting, consuming, or reorganizing, yet how these individual actions coordinate over a macroscopic number of cells to grow complex structures with exquisite functionality is unknown. Here we use recent advances in automatic differentiation to discover local interaction rules and genetic networks that yield emergent, systems-level characteristics in a model of development. We consider a growing tissue with cellular interactions are mediated by morphogen diffusion, differential cell adhesion and mechanical stress. Each cell has an internal genetic network that it uses to make decisions based on its local environment. We show that one can simultaneously learn parameters governing the cell interactions and the genetic network for complex developmental scenarios, including the symmetry breaking of an embryo from an initial cell, the creation of emergent chemical gradients,homogenization of growth via mechanical stress, programmed growth into a prespecified shape, and the ability to repair from damage. When combined with recent experimental advances measuring spatio-temporal dynamics and gene expression of cells in a growing tissue, the methodology outlined here offers a promising path to unravelling the cellular basis of development.


Highly Accurate Disease Diagnosis and Highly Reproducible Biomarker Identification with PathFormer

Dong, Zehao, Zhao, Qihang, Payne, Philip R. O., Province, Michael A, Cruchaga, Carlos, Zhang, Muhan, Zhao, Tianyu, Chen, Yixin, Li, Fuhai

arXiv.org Artificial Intelligence

Biomarker identification is critical for precise disease diagnosis and understanding disease pathogenesis in omics data analysis, like using fold change and regression analysis. Graph neural networks (GNNs) have been the dominant deep learning model for analyzing graph-structured data. However, we found two major limitations of existing GNNs in omics data analysis, i.e., limited-prediction (diagnosis) accuracy and limited-reproducible biomarker identification capacity across multiple datasets. The root of the challenges is the unique graph structure of biological signaling pathways, which consists of a large number of targets and intensive and complex signaling interactions among these targets. To resolve these two challenges, in this study, we presented a novel GNN model architecture, named PathFormer, which systematically integrate signaling network, priori knowledge and omics data to rank biomarkers and predict disease diagnosis. In the comparison results, PathFormer outperformed existing GNN models significantly in terms of highly accurate prediction capability ( 30% accuracy improvement in disease diagnosis compared with existing GNN models) and high reproducibility of biomarker ranking across different datasets. The improvement was confirmed using two independent Alzheimer's Disease (AD) and cancer transcriptomic datasets. The PathFormer model can be directly applied to other omics data analysis studies.


Gene Teams are on the Field: Evaluation of Variants in Gene-Networks Using High Dimensional Modelling

Tuna, Suha, Gulec, Cagri, Yucesan, Emrah, Cirakoglu, Ayse, Arguden, Yelda Tarkan

arXiv.org Artificial Intelligence

In medical genetics, each genetic variant is evaluated as an independent entity regarding its clinical importance. However, in most complex diseases, variant combinations in specific gene networks, rather than the presence of a particular single variant, predominates. In the case of complex diseases, disease status can be evaluated by considering the success level of a team of specific variants. We propose a high dimensional modelling based method to analyse all the variants in a gene network together. To evaluate our method, we selected two gene networks, mTOR and TGF-Beta. For each pathway, we generated 400 control and 400 patient group samples. mTOR and TGF-? pathways contain 31 and 93 genes of varying sizes, respectively. We produced Chaos Game Representation images for each gene sequence to obtain 2-D binary patterns. These patterns were arranged in succession, and a 3-D tensor structure was achieved for each gene network. Features for each data sample were acquired by exploiting Enhanced Multivariance Products Representation to 3-D data. Features were split as training and testing vectors. Training vectors were employed to train a Support Vector Machines classification model. We achieved more than 96% and 99% classification accuracies for mTOR and TGF-Beta networks, respectively, using a limited amount of training samples.


Network-based screen in iPSC-derived cells reveals therapeutic candidate for heart valve disease

Science

Small-molecule screens aimed at identifying therapeutic candidates traditionally search for molecules that affect one to several outputs at most, limiting discovery of true disease-modifying drugs. Theodoris et al. developed a machine-learning approach to identify small molecules that broadly correct gene networks dysregulated in a human induced pluripotent stem cell disease model of a common form of heart disease involving the aortic valve. Gene network correction by the most efficacious therapeutic candidate generalized to primary aortic valve cells derived from more than 20 patients with sporadic aortic valve disease and prevented aortic valve disease in vivo in a mouse model. Science , this issue p. [eabd0724][1] ### INTRODUCTION Determining the gene-regulatory networks that drive human disease allows the design of therapies that target the core disease mechanism rather than merely managing symptoms. However, small molecules used as therapeutic agents are traditionally screened for their effects on only one to several outputs at most, from which their predicted efficacy on the disease as a whole is extrapolated. In silico correlation of disease network dysregulation with pathways affected by molecules in surrogate cell types is limited by the relevance of the cell types used and by not directly testing compounds in patient cells. ### RATIONALE In principle, mapping the architecture of the dysregulated network in disease-relevant cells differentiated from patient-derived induced pluripotent stem cells (iPSCs) and subsequent screening for small molecules that broadly correct the abnormal gene network could overcome this obstacle. Specifically, targeting normalization of the core regulatory elements that drive the disease process, rather than correction of peripheral downstream effectors that may not be disease modifying, would have the greatest likelihood of therapeutic success. We previously demonstrated that haploinsufficiency of NOTCH1 can cause calcific aortic valve disease (CAVD), the third most common form of heart disease, and that the underlying mechanism involves derepression of osteoblast-like gene networks in cardiac valve cells. There is no medical therapy for CAVD, and in the United States alone, >100,000 surgical valve replacements are performed annually to relieve obstruction of blood flow from the heart. Many of these occur in the setting of a congenital aortic valve anomaly present in 1 to 2% of the population in which the aortic valve has two leaflets (bicuspid) rather than the normal three leaflets (tricuspid). Bicuspid valves in humans can also be caused by NOTCH1 mutations and predispose to early and more aggressive calcification in adulthood. Given that valve calcification progresses with age, a medical therapy that could slow or even arrest progression would have tremendous impact. ### RESULTS We developed a machine-learning approach to identify small molecules that sufficiently corrected gene network dysregulation in NOTCH1-haploinsufficient human iPSC-derived endothelial cells (ECs) such that they classified similar to NOTCH1 +/+ ECs derived from gene-corrected isogenic iPSCs. We screened 1595 small molecules for their effect on a signature of 119 genes representative of key regulatory nodes and peripheral genes from varied regions of the inferred NOTCH1-dependent network, assayed by targeted RNA sequencing (RNA-seq). Overall, eight molecules were validated to sufficiently correct the network signature such that NOTCH1 +/– ECs classified as NOTCH1 +/+ by the trained machine-learning algorithm. Of these, XCT790, an inverse agonist of estrogen-related receptor α (ERRα), had the strongest restorative effect on the key regulatory nodes SOX7 and TCF4 and on the network as a whole, as shown by full transcriptome RNA-seq. Gene network correction by XCT790 generalized to human primary aortic valve ECs derived from explanted valves from >20 patients with nonfamilial CAVD. XCT790 was effective in broadly restoring dysregulated genes toward the normal state in both calcified tricuspid and bicuspid valves, including the key regulatory nodes SOX7 and TCF4 . Furthermore, XCT790 was sufficient to prevent as well as treat already established aortic valve disease in vivo in a mouse model of Notch1 haploinsufficiency on a telomere-shortened background. XCT790 significantly reduced aortic valve thickness, the extent of calcification, and echocardiographic signs of valve stenosis in vivo. XCT790 also reduced the percentage of aortic valve cells expressing the osteoblast transcriptional regulator RUNX2, indicating a reduction in the osteogenic cell fate switch underlying CAVD. Whole-transcriptome RNA-seq in treated aortic valves showed that XCT790 broadly corrected the genes dysregulated in Notch1-haploinsufficient mice with shortened telomeres, and that treatment of diseased aortic valves promoted clustering of the transcriptome with that of healthy aortic valves. ### CONCLUSION Network-based screening that leverages iPSC and machine-learning technologies is an effective strategy to discover molecules with broadly restorative effects on gene networks dysregulated in human disease that can be validated in vivo. XCT790 represents an entry point for developing a much-needed medical therapy for calcification of the aortic valve, which may also affect the highly related and associated calcification of blood vessels. Given the efficacy of XCT790 in limiting valve thickening, the potential for XCT790 to alter the progression of childhood, and perhaps even fetal, valve stenosis also warrants further study. Application of this strategy to other human models of disease may increase the likelihood of identifying disease-modifying candidate therapies that are successful in vivo. ![Figure][2] Network-correcting therapeutic candidate for heart disease. A gene network–based screening approach leveraging human disease-specific iPSCs and machine learning identified a therapeutic candidate, XCT790, which corrected the network dysregulation in genetically defined iPSC-derived endothelial cells and primary aortic valve endothelial cells from >20 patients with sporadic aortic valve disease. XCT790 was also effective in preventing and treating a mouse model of aortic valve disease. ILLUSTRATION: CHRISTINA V. THEODORIS Mapping the gene-regulatory networks dysregulated in human disease would allow the design of network-correcting therapies that treat the core disease mechanism. However, small molecules are traditionally screened for their effects on one to several outputs at most, biasing discovery and limiting the likelihood of true disease-modifying drug candidates. Here, we developed a machine-learning approach to identify small molecules that broadly correct gene networks dysregulated in a human induced pluripotent stem cell (iPSC) disease model of a common form of heart disease involving the aortic valve (AV). Gene network correction by the most efficacious therapeutic candidate, XCT790, generalized to patient-derived primary AV cells and was sufficient to prevent and treat AV disease in vivo in a mouse model. This strategy, made feasible by human iPSC technology, network analysis, and machine learning, may represent an effective path for drug discovery. [1]: /lookup/doi/10.1126/science.abd0724 [2]: pending:yes


Structure Learning with Side Information: Sample Complexity

Sihag, Saurabh, Tajer, Ali

Neural Information Processing Systems

The vertices represent the RVs, and the edges signify the conditional dependencies among the RVs. Structure learning is the process of inferring the edges by observing realizations of the RVs, and it has applications in a wide range of technological, social, and biological networks. Learning the structure of graphs when the vertices are treated in isolation from inferential information known about them is well-investigated. In a wide range of domains, however, often there exist additional inferred knowledge about the structure, which can serve as valuable side information. For instance, the gene networks that represent different subtypes of the same cancer share similar edges across all subtypes and also have exclusive edges corresponding to each subtype, rendering partially similar graphical models for gene expression in different cancer subtypes.