Goto

Collaborating Authors

 computational biology




MODE: Learning compositional representations of complex systems with Mixtures Of Dynamical Experts

Quiblier, Nathan, Friedman, Roy, Ricci, Matthew

arXiv.org Artificial Intelligence

Dynamical systems in the life sciences are often composed of complex mixtures of overlapping behavioral regimes. Cellular subpopulations may shift from cycling to equilibrium dynamics or branch towards different developmental fates. The transitions between these regimes can appear noisy and irregular, posing a serious challenge to traditional, flow-based modeling techniques which assume locally smooth dynamics. To address this challenge, we propose MODE (Mixture Of Dynamical Experts), a graphical modeling framework whose neural gating mechanism decomposes complex dynamics into sparse, interpretable components, enabling both the unsupervised discovery of behavioral regimes and accurate long-term forecasting across regime transitions. Crucially, because agents in our framework can jump to different governing laws, MODE is especially tailored to the aforementioned noisy transitions. We evaluate our method on a battery of synthetic and real datasets from computational biology. First, we systematically benchmark MODE on an unsupervised classification task using synthetic dynamical snapshot data, including in noisy, few-sample settings. Next, we show how MODE succeeds on challenging forecasting tasks which simulate key cycling and branching processes in cell biology. Finally, we deploy our method on human, single-cell RNA sequencing data and show that it can not only distinguish proliferation from differentiation dynamics but also predict when cells will commit to their ultimate fate, a key outstanding challenge in computational biology.




Drug-Target Interaction/Affinity Prediction: Deep Learning Models and Advances Review

Vefghi, Ali, Rahmati, Zahed, Akbari, Mohammad

arXiv.org Artificial Intelligence

Drug discovery remains a slow and expensive process that involves many steps, from detecting the target structure to obtaining approval from the Food and Drug Administration (FDA), and is often riddled with safety concerns. Accurate prediction of how drugs interact with their targets and the development of new drugs by using better methods and technologies have immense potential to speed up this process, ultimately leading to faster delivery of life-saving medications. Traditional methods used for drug-target interaction prediction show limitations, particularly in capturing complex relationships between drugs and their targets. As an outcome, deep learning models have been presented to overcome the challenges of interaction prediction through their precise and efficient end results. By outlining promising research avenues and models, each with a different solution but similar to the problem, this paper aims to give researchers a better idea of methods for even more accurate and efficient prediction of drug-target interaction, ultimately accelerating the development of more effective drugs. A total of 180 prediction methods for drug-target interactions were analyzed throughout the period spanning 2016 to 2025 using different frameworks based on machine learning, mainly deep learning and graph neural networks. Additionally, this paper discusses the novelty, architecture, and input representation of these models.


Deciphering genomic codes using advanced NLP techniques: a scoping review

Cheng, Shuyan, Wei, Yishu, Zhou, Yiliang, Xu, Zihan, Wright, Drew N, Liu, Jinze, Peng, Yifan

arXiv.org Artificial Intelligence

Objectives: The vast and complex nature of human genomic sequencing data presents challenges for effective analysis. This review aims to investigate the application of Natural Language Processing (NLP) techniques, particularly Large Language Models (LLMs) and transformer architectures, in deciphering genomic codes, focusing on tokenization, transformer models, and regulatory annotation prediction. The goal of this review is to assess data and model accessibility in the most recent literature, gaining a better understanding of the existing capabilities and constraints of these tools in processing genomic sequencing data. Methods: Following Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, our scoping review was conducted across PubMed, Medline, Scopus, Web of Science, Embase, and ACM Digital Library. Studies were included if they focused on NLP methodologies applied to genomic sequencing data analysis, without restrictions on publication date or article type. Results: A total of 26 studies published between 2021 and April 2024 were selected for review. The review highlights that tokenization and transformer models enhance the processing and understanding of genomic data, with applications in predicting regulatory annotations like transcription-factor binding sites and chromatin accessibility. Discussion: The application of NLP and LLMs to genomic sequencing data interpretation is a promising field that can help streamline the processing of large-scale genomic data while also providing a better understanding of its complex structures. It has the potential to drive advancements in personalized medicine by offering more efficient and scalable solutions for genomic analysis. Further research is also needed to discuss and overcome current limitations, enhancing model transparency and applicability.


Opportunities in deep learning methods development for computational biology

Lee, Alex Jihun, Abbasi-Asl, Reza

arXiv.org Artificial Intelligence

Advances in molecular technologies underlie an enormous growth in the size of data sets pertaining to biology and biomedicine. These advances parallel those in the deep learning subfield of machine learning. Components in the differentiable programming toolbox that makes deep learning possible are allowing computer scientists to address an increasingly large array of problems with flexible and effective tools. However many of these tools have not fully proliferated into the computational biology and bioinformatics fields. In this perspective we survey some of these advances and highlight exemplary examples of their utilization in the biosciences, with the goal of increasing awareness among practitioners of emerging opportunities to blend expert knowledge with newly emerging deep learning architectural tools.


Hierarchical Multitask Structured Output Learning for Large-Scale Sequence Segmentation

Neural Information Processing Systems

We present a novel regularization-based Multitask Learning (MTL) formulation for Structured Output (SO) prediction for the case of hierarchical task relations. Structured output prediction often leads to difficult inference problems and hence requires large amounts of training data to obtain accurate models. We propose to use MTL to exploit additional information from related learning tasks by means of hierarchical regularization. Training SO models on the combined set of examples from multiple tasks can easily become infeasible for real world applications. To be able to solve the optimization problems underlying multitask structured output learning, we propose an efficient algorithm based on bundle-methods. We demonstrate the performance of our approach in applications from the domain of computational biology addressing the key problem of gene finding. We show that 1) our proposed solver achieves much faster convergence than previous methods and 2) that the Hierarchical SO-MTL approach outperforms considered non-MTL methods.


Efficient Structure Learning of Markov Networks using L_1 -Regularization

Neural Information Processing Systems

Markov networks are commonly used in a wide variety of applications, ranging from computer vision, to natural language, to computational biology. In most current applications, even those that rely heavily on learned models, the structure of the Markov network is constructed by hand, due to the lack of effective algorithms for learning Markov network structure from data. In this paper, we provide a computationally efficient method for learning Markov network structure from data. Our method is based on the use of L1 regularization on the weights of the log-linear model, which has the effect of biasing the model towards solutions where many of the parameters are zero. This formulation converts the Markov network learning problem into a convex optimization problem in a continuous space, which can be solved using efficient gradient methods.