gene essentiality
Predicting gene essentiality and drug response from perturbation screens in preclinical cancer models with LEAP: Layered Ensemble of Autoencoders and Predictors
Bodinier, Barbara, Dissez, Gaetan, Bleistein, Linus, Dauvin, Antonin
Preclinical perturbation screens, where the effects of genetic, chemical, or environmental perturbations are systematically tested on disease models, hold significant promise for machine learning-enhanced drug discovery due to their scale and causal nature. Predictive models can infer perturbation responses for previously untested disease models based on molecular profiles. These in silico labels can expand databases and guide experimental prioritization. However, modelling perturbation-specific effects and generating robust prediction performances across diverse biological contexts remain elusive. We introduce LEAP (Layered Ensemble of Autoencoders and Predictors), a novel ensemble framework to improve robustness and generalization. LEAP leverages multiple DAMAE (Data Augmented Masked Autoencoder) representations and LASSO regressors. By combining diverse gene expression representation models learned from different random initializations, LEAP consistently outperforms state-of-the-art approaches in predicting gene essentiality or drug responses in unseen cell lines, tissues and disease models. Notably, our results show that ensembling representation models, rather than prediction models alone, yields superior predictive performance. Beyond its performance gains, LEAP is computationally efficient, requires minimal hyperparameter tuning and can therefore be readily incorporated into drug discovery pipelines to prioritize promising targets and support biomarker-driven stratification. The code and datasets used in this work are made publicly available.
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
DeepHEN: quantitative prediction essential lncRNA genes and rethinking essentialities of lncRNA genes
Zhang, Hanlin, Cheng, Wenzheng
Gene essentiality refers to the degree to which a gene is necessary for the survival and reproductive efficacy of a living organism. Although the essentiality of non-coding genes has been documented, there are still aspects of non-coding genes' essentiality that are unknown to us. For example, We do not know the contribution of sequence features and network spatial features to essentiality. As a consequence, in this work, we propose DeepHEN that could answer the above question. By buidling a new lncRNA-proteion-protein network and utilizing both representation learning and graph neural network, we successfully build our DeepHEN models that could predict the essentiality of lncRNA genes. Compared to other methods for predicting the essentiality of lncRNA genes, our DeepHEN model not only tells whether sequence features or network spatial features have a greater influence on essentiality but also addresses the overfitting issue of those methods caused by the low number of essential lncRNA genes, as evidenced by the results of enrichment analysis. Keywords: sample, graph neural network, representation learing, lncRNA-protein-protein network, essential non-coding genes INTORDUCTION Gene essentiality refers to the degree to which a gene is necessary for the survival and reproductive success of a living system. Genes that are indispensable in fulfilling these functions are classified as essential genes[1]. The concept of gene essentiality is dynamic and influenced by the specific context in which it is assessed.
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Therapeutic Area > Oncology (0.46)
Prediction of gene essentiality using machine learning and genome-scale metabolic models
The identification of essential genes, i.e. those that impair cell survival when deleted, requires large growth assays of knock-out strains. The complexity and cost of such experiments has triggered a growing interest in computational methods for gene essentiality prediction. In the case of metabolic genes, Flux Balance Analysis (FBA) is widely employed to predict essentiality under the assumption that cells maximize their growth rate. However, this approach implicitly assumes that knock-out strains optimize the same objectives as the wild-type, which excludes cases in which deletions cause large changes in cell physiology to meet other objectives for survival. Here we resolve this limitation with a novel machine learning approach that predicts essentiality directly from wild-type flux distributions. We first project the wild-type FBA solution onto a mass flow graph, a digraph with reactions as nodes and edge weights proportional to the mass transfer between reactions, and then train binary classifiers on the connectivity of graph nodes.