Goto

Collaborating Authors

 interactome


Transformation of Biological Networks into Images via Semantic Cartography for Visual Interpretation and Scalable Deep Analysis

Mostafa, Sakib, Xing, Lei, Islam, Md. Tauhidul

arXiv.org Artificial Intelligence

Complex biological networks are fundamental to biomedical science, capturing interactions among molecules, cells, genes, and tissues. Deciphering these networks is critical for understanding health and disease, yet their scale and complexity represent a daunting challenge for current computational methods. Traditional biological network analysis methods, including deep learning approaches, while powerful, face inherent challenges such as limited scalability, oversmoothing long-range dependencies, difficulty in multimodal integration, expressivity bounds, and poor interpretability. We present Graph2Image, a framework that transforms large biological networks into sets of two-dimensional images by spatially arranging representative network nodes on a 2D grid. This transformation decouples the nodes as images, enabling the use of convolutional neural networks (CNNs) with global receptive fields and multi-scale pyramids, thus overcoming limitations of existing biological network analysis methods in scalability, memory efficiency, and long-range context capture. Graph2Image also facilitates seamless integration with other imaging and omics modalities and enhances interpretability through direct visualization of node-associated images. When applied to several large-scale biological network datasets, Graph2Image improved classification accuracy by up to 67.2% over existing methods and provided interpretable visualizations that revealed biologically coherent patterns. It also allows analysis of very large biological networks (nodes > 1 billion) on a personal computer. Graph2Image thus provides a scalable, interpretable, and multimodal-ready approach for biological network analysis, offering new opportunities for disease diagnosis and the study of complex biological systems.


MLPrE -- A tool for preprocessing and exploratory data analysis prior to machine learning model construction

Maxwell, David S, Darkoh, Michael, Samudrala, Sidharth R, Chung, Caroline, Schmidt, Stephanie T, Al-Lazikani, Bissan

arXiv.org Artificial Intelligence

With the recent growth of Deep Learning for AI, there is a need for tools to meet the demand of data flowing into those models. In some cases, source data may exist in multiple formats, and therefore the source data must be investigated and properly engineered for a Machine Learning model or graph database. Overhead and lack of scalability with existing workflows limit integration within a larger processing pipeline such as Apache Airflow, driving the need for a robust, extensible, and lightweight tool to preprocess arbitrary datasets that scales with data type and size. To address this, we present Machine Learning Preprocessing and Exploratory Data Analysis, MLPrE, in which SparkDataFrames were utilized to hold data during processing and ensure scalability. A generalizable JSON input file format was utilized to describe stepwise changes to that DataFrame. Stages were implemented for input and output, filtering, basic statistics, feature engineering, and exploratory data analysis. A total of 69 stages were implemented into MLPrE, of which we highlight and demonstrate key stages using six diverse datasets. We further highlight MLPrE's ability to independently process multiple fields in flat files and recombine them, otherwise requiring an additional pipeline, using a UniProt glossary term dataset. Building on this advantage, we demonstrated the clustering stage with available wine quality data. Lastly, we demonstrate the preparation of data for a graph database in the final stages of MLPrE using phosphosite kinase data. Overall, our MLPrE tool offers a generalizable and scalable tool for preprocessing and early data analysis, filling a critical need for such a tool given the ever expanding use of machine learning. This tool serves to accelerate and simplify early stage development in larger workflows.


Improving Disease Comorbidity Prediction Based on Human Interactome with Biologically Supervised Graph Embedding

Qin, Xihan, Liao, Li

arXiv.org Artificial Intelligence

Comorbidity carries significant implications for disease understanding and management. The genetic causes for comorbidity often trace back to mutations occurred either in the same gene associated with two diseases or in different genes associated with different diseases respectively but coming into connection via protein-protein interactions. Therefore, human interactome has been used in more sophisticated study of disease comorbidity. Human interactome, as a large incomplete graph, presents its own challenges to extracting useful features for comorbidity prediction. In this work, we introduce a novel approach named Biologically Supervised Graph Embedding (BSE) to allow for selecting most relevant features to enhance the prediction accuracy of comorbid disease pairs. Our investigation into BSE's impact on both centered and uncentered embedding methods showcases its consistent superiority over the state-of-the-art techniques and its adeptness in selecting dimensions enriched with vital biological insights, thereby improving prediction performance significantly, up to 50% when measured by ROC for some variations. Further analysis indicates that BSE consistently and substantially improves the ratio of disease associations to gene connectivity, affirming its potential in uncovering latent biological factors affecting comorbidity. The statistically significant enhancements across diverse metrics underscore BSE's potential to introduce novel avenues for precise disease comorbidity predictions and other potential applications. The GitHub repository containing the source code can be accessed at the following link: https://github.com/xihan-qin/Biologically-Supervised-Graph-Embedding.


Link prediction with continuous-time classical and quantum walks

Goldsmith, Mark, García-Pérez, Guillermo, Malmi, Joonas, Rossi, Matteo A. C., Saarinen, Harto, Maniscalco, Sabrina

arXiv.org Artificial Intelligence

Protein-protein interaction (PPI) networks consist of the physical and/or functional interactions between the proteins of an organism. Since the biophysical and high-throughput methods used to form PPI networks are expensive, time-consuming, and often contain inaccuracies, the resulting networks are usually incomplete. In order to infer missing interactions in these networks, we propose a novel class of link prediction methods based on continuous-time classical and quantum random walks. In the case of quantum walks, we examine the usage of both the network adjacency and Laplacian matrices for controlling the walk dynamics. We define a score function based on the corresponding transition probabilities and perform tests on four real-world PPI datasets. Our results show that continuous-time classical random walks and quantum walks using the network adjacency matrix can successfully predict missing protein-protein interactions, with performance rivalling the state of the art.


Artificial intelligence successfully predicts protein interactions

#artificialintelligence

UT Southwestern and University of Washington researchers led an international team that used artificial intelligence (AI) and evolutionary analysis to produce 3D models of eukaryotic protein interactions. The study, published in Science, identified more than 100 probable protein complexes for the first time and provided structural models for more than 700 previously uncharacterized ones. Insights into the ways pairs or groups of proteins fit together to carry out cellular processes could lead to a wealth of new drug targets. "Our results represent a significant advance in the new era in structural biology in which computation plays a fundamental role," said Qian Cong, Ph.D., Assistant Professor in the Eugene McDermott Center for Human Growth and Development with a secondary appointment in Biophysics. Dr. Cong led the study with David Baker, Ph.D., Professor of Biochemistry and Dr. Cong's postdoctoral mentor at the University of Washington prior to her recruitment to UT Southwestern.


Trainee Rounds seminars: AI in Medicine

#artificialintelligence

DATE: August 10, 2021 (Tuesday) TIME: 12pm to 1pm HOW: Zoom meeting AUDIENCE: This event is open to the public. Anastasia Razdaibiedina PhD student, Computational Biology and Machine Learning, University of Toronto Discovering gene-disease relationships with deep learning Understanding the genetic causes of diseases is one of the central goals in medicine. Most diseases have a complex genetic basis, and genes often act in'modules' to determine phenotypes. An effective way to discover a module of disease-associated genes, is to use biological networks, or interactomes, that describe interactions between genes and proteins. Here we use deep learning methods to infer an interactome computationally from microscopy imaging data, and subsequently discover gene-disease relationships from the constructed interactome.


Artificial intelligence in COVID-19 drug repurposing

#artificialintelligence

One study estimated that pharmaceutical companies spent US$2·6 billion in 2015, up from $802 million in 2003, for the development of a new chemical entity approved by the US Food and Drug Administration (FDA). N Engl J Med. 2015; 372: 1877-1879 The increasing cost of drug development is due to the large volume of compounds to be tested in preclinical stages and the high proportion of randomised controlled trials (RCTs) that do not find clinical benefits or with toxicity issues. Given the high attrition rates, substantial costs, and low pace of de-novo drug discovery, exploiting known drugs can help improve their efficacy while minimising side-effects in clinical trials. As Nobel Prize-winning pharmacologist Sir James Black said, "The most fruitful basis for the discovery of a new drug is to start with an old drug". New uses for old drugs.


Biological Random Walks: integrating heterogeneous data in disease gene prioritization

Gentili, Michele, Martini, Leonardo, Petti, Manuela, Farina, Lorenzo, Becchetti, Luca

arXiv.org Machine Learning

This work proposes a unified framework to leverage biological information in network propagation-based gene prioritization algorithms. Preliminary results on breast cancer data show significant improvements over state-of-the-art baselines, such as the prioritization of genes that are not identified as potential candidates by interactome-based algorithms, but that appear to be involved in/or potentially related to breast cancer, according to a functional analysis based on recent literature.


Graph-Sparse Logistic Regression

LeNail, Alexander, Schmidt, Ludwig, Li, Johnathan, Ehrenberger, Tobias, Sachs, Karen, Jegelka, Stefanie, Fraenkel, Ernest

arXiv.org Machine Learning

We introduce Graph-Sparse Logistic Regression, a new algorithm for classification for the case in which the support should be sparse but connected on a graph. We val- idate this algorithm against synthetic data and benchmark it against L1-regularized Logistic Regression. We then explore our technique in the bioinformatics context of proteomics data on the interactome graph. We make all our experimental code public and provide GSLR as an open source package.