Goto

Collaborating Authors

 Ding, Yijie


A generalizable framework for unlocking missing reactions in genome-scale metabolic networks using deep learning

arXiv.org Artificial Intelligence

Incomplete knowledge of metabolic processes hinders the accuracy of GEnome-scale Metabolic models (GEMs), which in turn impedes advancements in systems biology and metabolic engineering. Existing gap-filling methods typically rely on phenotypic data to minimize the disparity between computational predictions and experimental results. However, there is still a lack of an automatic and precise gap-filling method for initial state GEMs before experimental data and annotated genomes become available. In this study, we introduce CLOSEgaps, a deep learning-driven tool that addresses the gap-filling issue by modeling it as a hyperedge prediction problem within GEMs. Specifically, CLOSEgaps maps metabolic networks as hypergraphs and learns their hyper-topology features to identify missing reactions and gaps by leveraging hypothetical reactions. This innovative approach allows for the characterization and curation of both known and hypothetical reactions within metabolic networks. Extensive results demonstrate that CLOSEgaps accurately gap-filling over 96% of artificially introduced gaps for various GEMs. Furthermore, CLOSEgaps enhances phenotypic predictions for 24 GEMs and also finds a notable improvement in producing four crucial metabolites (Lactate, Ethanol, Propionate, and Succinate) in two organisms. As a broadly applicable solution for any GEM, CLOSEgaps represents a promising model to automate the gap-filling process and uncover missing connections between reactions and observed metabolic phenotypes.


SBSM-Pro: Support Bio-sequence Machine for Proteins

arXiv.org Artificial Intelligence

Bio-sequences, which include DNA, RNA, and proteins, are the molecular foundation of modern genetic research. The classification of bio-sequences based on sequence information has been a key focus in bioinformatics research. At present, with the sequential completion of genome mapping from humans to various species, we have amassed a vast amount of sequence data, creating an urgent need for computer-assisted annotation of sequence functions. Although it is statistically evident that genetic sequences determine hereditary diseases, the mechanisms by which sequence variations contribute to diseases are intricately complex. It is difficult to address and interpret all these issues through one biological experiment; hence, multiple computer predictions are needed to guide the progression of wet lab exploration. In summary, the application of information science and machine learning to bio-sequence classification is a valuable tool for assisting researchers in comprehending and analysing bio-sequences. It serves as a key driving force for advancing research in the field of bioinformatics. In the field of bio-sequence classification, machine learning methods are broadly pursued using two strategies: feature extraction combined with traditional classification methods and direct sequence classification via deep learning techniques. For bio-sequences, relevant features are mainly characterized as frequency, physicochemical, structural, and evolutionary features.