regulatory network
DAG Learning from Zero-Inflated Count Data Using Continuous Optimization
Sato, Noriaki, Scutari, Marco, Kawano, Shuichi, Yamaguchi, Rui, Imoto, Seiya
We address network structure learning from zero-inflated count data by casting each node as a zero-inflated generalized linear model and optimizing a smooth, score-based objective under a directed acyclic graph constraint. Our Zero-Inflated Continuous Optimization (ZICO) approach uses node-wise likelihoods with canonical links and enforces acyclicity through a differentiable surrogate constraint combined with sparsity regularization. ZICO achieves superior performance with faster runtimes on simulated data. It also performs comparably to or better than common algorithms for reverse engineering gene regulatory networks. ZICO is fully vectorized and mini-batched, enabling learning on larger variable sets with practical runtimes in a wide range of domains.
Towards Identifiability of Interventional Stochastic Differential Equations
Zweig, Aaron, Lin, Zaikang, Azizi, Elham, Knowles, David
We study identifiability of stochastic differential equations (SDE) under multiple interventions. Our results give the first provable bounds for unique recovery of SDE parameters given samples from their stationary distributions. We give tight bounds on the number of necessary interventions for linear SDEs, and upper bounds for nonlinear SDEs in the small noise regime. We experimentally validate the recovery of true parameters in synthetic data, and motivated by our theoretical results, demonstrate the advantage of parameterizations with learnable activation functions in application to gene regulatory dynamics. Stochastic dynamical systems are ubiquitous as models for natural data. They are perfectly suited for application to time-series data, and therefore also a good candidate to characterize systems that reach a steady state in the limit. If a system is governed by some stochastic differential equation (SDE) and the same system is observed under different interventions, ideally one would learn the underlying parameters governing the dynamics, and guarantee accurate prediction under new interventions. However, in many natural settings, data is modeled as following an SDE even if one does not have access to explicit trajectories. Studies of ecological systems focus on the long-term survival of multiple species modeled by the quasi-stationary state of SDEs with environmental factors as perturbations (Hening & Li, 2021). The application of flow cytometry to protein signaling networks under perturbation (Sachs et al., 2005) is destructive and yields protein quantification at one time point, modeled using the stationary distributions of linear SDEs in V arando & Hansen (2020).
A PBN-RL-XAI Framework for Discovering a "Hit-and-Run" Therapeutic Strategy in Melanoma
Innate resistance to anti-PD-1 immunotherapy remains a major clinical challenge in metastatic melanoma, with the underlying molecular networks being poorly understood. To address this, we constructed a dynamic Probabilistic Boolean Network model using transcriptomic data from patient tumor biopsies to elucidate the regulatory logic governing therapy response. We then employed a reinforcement learning agent to systematically discover optimal, multi-step therapeutic interventions and used explainable artificial intelligence to mechanistically interpret the agent's control policy. The analysis revealed that a precisely timed, 4-step temporary inhibition of the lysyl oxidase like 2 protein (LOXL2) was the most effective strategy. Our explainable analysis showed that this ''hit-and-run" intervention is sufficient to erase the molecular signature driving resistance, allowing the network to self-correct without requiring sustained intervention. This study presents a novel, time-dependent therapeutic hypothesis for overcoming immunotherapy resistance and provides a powerful computational framework for identifying non-obvious intervention protocols in complex biological systems.
Simulation-free Structure Learning for Stochastic Dynamics
Rimawi-Fine, Noah El, Stecklov, Adam, Nelson, Lucas, Blanchette, Mathieu, Tong, Alexander, Zhang, Stephen Y., Atanackovic, Lazar
Modeling dynamical systems and unraveling their underlying causal relationships is central to many domains in the natural sciences. Various physical systems, such as those arising in cell biology, are inherently high-dimensional and stochastic in nature, and admit only partial, noisy state measurements. This poses a significant challenge for addressing the problems of modeling the underlying dynamics and inferring the network structure of these systems. Existing methods are typically tailored either for structure learning or modeling dynamics at the population level, but are limited in their ability to address both problems together. In this work, we address both problems simultaneously: we present StructureFlow, a novel and principled simulation-free approach for jointly learning the structure and stochastic population dynamics of physical systems. We showcase the utility of StructureFlow for the tasks of structure learning from interventions and dynamical (trajectory) inference of conditional population dynamics. We empirically evaluate our approach on high-dimensional synthetic systems, a set of biologically plausible simulated systems, and an experimental single-cell dataset. We show that StructureFlow can learn the structure of underlying systems while simultaneously modeling their conditional population dynamics -- a key step toward the mechanistic understanding of systems behavior.
Modeling GRNs with a Probabilistic Categorical Framework
Jia, Yiyang, Wei, Zheng, Yang, Zheng, Peng, Guohong
Understanding the complex and stochastic nature of Gene Regulatory Networks (GRNs) remains a central challenge in systems biology. Existing modeling paradigms often struggle to effectively capture the intricate, multi-factor regulatory logic and to rigorously manage the dual uncertainties of network structure and kinetic parameters. In response, this work introduces the Probabilistic Categorical GRN(PC-GRN) framework. It is a novel theoretical approach founded on the synergistic integration of three core methodologies. Firstly, category theory provides a formal language for the modularity and composition of regulatory pathways. Secondly, Bayesian Typed Petri Nets (BTPNs) serve as an interpretable,mechanistic substrate for modeling stochastic cellular processes, with kinetic parameters themselves represented as probability distributions. The central innovation of PC-GRN is its end-to-end generative Bayesian inference engine, which learns a full posterior distribution over BTPN models (P (G, ฮ|D)) directly from data. This is achieved by the novel interplay of a GFlowNet, which learns a policy to sample network topologies, and a HyperNetwork, which performs amortized inference to predict their corresponding parameter distributions. The resulting framework provides a mathematically rigorous, biologically interpretable, and uncertainty-aware representation of GRNs, advancing predictive modeling and systems-level analysis.
STAGED: A Multi-Agent Neural Network for Learning Cellular Interaction Dynamics
Rocha, Joao F., Xu, Ke, Sun, Xingzhi, Krishna, Ananya, Bhaskar, Dhananjay, Mongeon, Blanche, Craig, Morgan, Gerstein, Mark, Krishnaswamy, Smita
The advent of single-cell technology has significantly improved our understanding of cellular states and subpopulations in various tissues under normal and diseased conditions by employing data-driven approaches such as clustering and trajectory inference. However, these methods consider cells as independent data points of population distributions. With spatial transcriptomics, we can represent cellular organization, along with dynamic cell-cell interactions that lead to changes in cell state. Still, key computational advances are necessary to enable the data-driven learning of such complex interactive cellular dynamics. While agent-based modeling (ABM) provides a powerful framework, traditional approaches rely on handcrafted rules derived from domain knowledge rather than data-driven approaches. To address this, we introduce Spatio Temporal Agent-Based Graph Evolution Dynamics(STAGED) integrating ABM with deep learning to model intercellular communication, and its effect on the intracellular gene regulatory network. Using graph ODE networks (GDEs) with shared weights per cell type, our approach represents genes as vertices and interactions as directed edges, dynamically learning their strengths through a designed attention mechanism. Trained to match continuous trajectories of simulated as well as inferred trajectories from spatial transcriptomics data, the model captures both intercellular and intracellular interactions, enabling a more adaptive and accurate representation of cellular dynamics.
Multi-omic Causal Discovery using Genotypes and Gene Expression
Asiedu, Stephen, Watson, David
Causal discovery in multi-omic datasets is crucial for understanding the bigger picture of gene regulatory mechanisms, but remains challenging due to high dimensionality, differentiation of direct from indirect relationships, and hidden confounders. We introduce GENESIS (GEne Network inference from Expression SIgnals and SNPs), a constraint-based algorithm that leverages the natural causal precedence of genotypes to infer ancestral relationships in transcriptomic data. Unlike traditional causal discovery methods that start with a fully connected graph, GENESIS initialises an empty ancestrality matrix and iteratively populates it with direct, indirect or non-causal relationships using a series of provably sound marginal and conditional independence tests. By integrating genotypes as fixed causal anchors, GENESIS provides a principled ``head start'' to classical causal discovery algorithms, restricting the search space to biologically plausible edges. We test GENESIS on synthetic and real-world genomic datasets. This framework offers a powerful avenue for uncovering causal pathways in complex traits, with promising applications to functional genomics, drug discovery, and precision medicine.
Machine Learning Methods for Gene Regulatory Network Inference
Hegde, Akshata, Nguyen, Tom, Cheng, Jianlin
Proper regulation of gene expression is essential to ensure that genes are activated only when necessary and that their activity is properly controlled [3]. The regulation of gene expression is achieved through understanding the intricate interactions between genes and other molecules. In this effort, Gene Regulatory Networks have emerged as a strong tool[2]. Gene regulatory networks (GRNs) are complex systems that determine the development, differentiation, and function of cells and organisms, as well as their response to environmental stimuli [4][5]. GRNs consist of genes, transcription factors (TFs), microRNAs, and other regulatory molecules that interact with each other to control gene expression [6]. The regulatory interactions between these molecules can form complex networks that exhibit emergent properties, such as robustness and adaptability [7]. In its simplest form, a GRN is a network of genes and their regulatory interactions, which govern the expression of these genes in response to various cellular cues. It is worth noting that in this definition, a transcription factor (TF) is considered a special kind of gene that may regulate the expression of other non-TF or TF genes. Each gene in the network acts as a node, and the regulatory interactions between genes are represented by directed edges connecting these nodes[8].
InfoSEM: A Deep Generative Model with Informative Priors for Gene Regulatory Network Inference
Cui, Tianyu, Xu, Song-Jun, Moskalev, Artem, Li, Shuwei, Mansi, Tommaso, Prakash, Mangal, Liao, Rui
Inferring Gene Regulatory Networks (GRNs) from gene expression data is crucial for understanding biological processes. While supervised models are reported to achieve high performance for this task, they rely on costly ground truth (GT) labels and risk learning gene-specific biases, such as class imbalances of GT interactions, rather than true regulatory mechanisms. To address these issues, we introduce InfoSEM, an unsupervised generative model that leverages textual gene embeddings as informative priors, improving GRN inference without GT labels. InfoSEM can also integrate GT labels as an additional prior when available, avoiding biases and further enhancing performance. Additionally, we propose a biologically motivated benchmarking framework that better reflects real-world applications such as biomarker discovery and reveals learned biases of existing supervised methods. InfoSEM outperforms existing models by 38.5% across four datasets using textual embeddings prior and further boosts performance by 11.1% when integrating labeled data as priors.
GRNFormer: A Biologically-Guided Framework for Integrating Gene Regulatory Networks into RNA Foundation Models
Qiu, Mufan, Hu, Xinyu, Zhan, Fengwei, Yun, Sukwon, Peng, Jie, Zhang, Ruichen, Kailkhura, Bhavya, Yang, Jiekun, Chen, Tianlong
Foundation models for single-cell RNA sequencing (scRNA-seq) have shown promising capabilities in capturing gene expression patterns. However, current approaches face critical limitations: they ignore biological prior knowledge encoded in gene regulatory relationships and fail to leverage multi-omics signals that could provide complementary regulatory insights. In this paper, we propose GRNFormer, a new framework that systematically integrates multi-scale Gene Regulatory Networks (GRNs) inferred from multi-omics data into RNA foundation model training. Our framework introduces two key innovations. First, we introduce a pipeline for constructing hierarchical GRNs that capture regulatory relationships at both cell-type-specific and cell-specific resolutions. Second, we design a structure-aware integration framework that addresses the information asymmetry in GRNs through two technical advances: (1) A graph topological adapter using multi-head cross-attention to weight regulatory relationships dynamically, and (2) a novel edge perturbation strategy that perturb GRNs with biologically-informed co-expression links to augment graph neural network training. Comprehensive experiments have been conducted on three representative downstream tasks across multiple model architectures to demonstrate the effectiveness of GRNFormer. It achieves consistent improvements over state-of-the-art (SoTA) baselines: $3.6\%$ increase in drug response prediction correlation, $9.6\%$ improvement in single-cell drug classification AUC, and $1.1\%$ average gain in gene perturbation prediction accuracy.