monosaccharide
Modeling All-Atom Glycan Structures via Hierarchical Message Passing and Multi-Scale Pre-training
Xu, Minghao, Song, Jiaze, Wu, Keming, Zhou, Xiangxin, Cui, Bin, Zhang, Wentao
Understanding the various properties of glycans with machine learning has shown some preliminary promise. However, previous methods mainly focused on modeling the backbone structure of glycans as graphs of monosaccharides (i.e., sugar units), while they neglected the atomic structures underlying each monosaccharide, which are actually important indicators of glycan properties. We fill this blank by introducing the GlycanAA model for All-Atom-wise Glycan modeling. GlycanAA models a glycan as a heterogeneous graph with monosaccharide nodes representing its global backbone structure and atom nodes representing its local atomic-level structures. Based on such a graph, GlycanAA performs hierarchical message passing to capture from local atomic-level interactions to global monosaccharide-level interactions. To further enhance model capability, we pre-train GlycanAA on a high-quality unlabeled glycan dataset, deriving the PreGlycanAA model. We design a multi-scale mask prediction algorithm to endow the model about different levels of dependencies in a glycan. Extensive benchmark results show the superiority of GlycanAA over existing glycan encoders and verify the further improvements achieved by PreGlycanAA. We maintain all resources at https://github.com/kasawa1234/GlycanAA
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Therapeutic Area > Immunology (0.46)
Higher-Order Message Passing for Glycan Representation Learning
Glycans are the most complex biological sequence, with monosaccharides forming extended, non-linear sequences. As post-translational modifications, they modulate protein structure, function, and interactions. Due to their diversity and complexity, predictive models of glycan properties and functions are still insufficient. Graph Neural Networks (GNNs) are deep learning models designed to process and analyze graph-structured data. These architectures leverage the connectivity and relational information in graphs to learn effective representations of nodes, edges, and entire graphs. Iteratively aggregating information from neighboring nodes, GNNs capture complex patterns within graph data, making them particularly well-suited for tasks such as link prediction or graph classification across domains. This work presents a new model architecture based on combinatorial complexes and higher-order message passing to extract features from glycan structures into a latent space representation. The architecture is evaluated on an improved GlycanML benchmark suite, establishing a new state-of-the-art performance. We envision that these improvements will spur further advances in computational glycosciences and reveal the roles of glycans in biology.
- Europe > Germany > Saarland > Saarbrücken (0.14)
- Europe > Sweden > Vaestra Goetaland > Gothenburg (0.05)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Greece (0.04)
GlycoNMR: Dataset and benchmarks for NMR chemical shift prediction of carbohydrates with graph neural networks
Chen, Zizhang, Badman, Ryan Paul, Foley, Lachele, Woods, Robert, Hong, Pengyu
Molecular representation learning (MRL) is a powerful tool for bridging the gap between machine learning and chemical sciences, as it converts molecules into numerical representations while preserving their chemical features. These encoded representations serve as a foundation for various downstream biochemical studies, including property prediction and drug design. MRL has had great success with proteins and general biomolecule datasets. Yet, in the growing sub-field of glycoscience (the study of carbohydrates, where longer carbohydrates are also called glycans), MRL methods have been barely explored. This under-exploration can be primarily attributed to the limited availability of comprehensive and well-curated carbohydrate-specific datasets and a lack of Machine learning (ML) pipelines specifically tailored to meet the unique problems presented by carbohydrate data. Since interpreting and annotating carbohydrate-specific data is generally more complicated than protein data, domain experts are usually required to get involved. The existing MRL methods, predominately optimized for proteins and small biomolecules, also cannot be directly used in carbohydrate applications without special modifications. To address this challenge, accelerate progress in glycoscience, and enrich the data resources of the MRL community, we introduce GlycoNMR. GlycoNMR contains two laboriously curated datasets with 2,609 carbohydrate structures and 211,543 annotated nuclear magnetic resonance (NMR) chemical shifts for precise atomic-level prediction. We tailored carbohydrate-specific features and adapted existing MRL models to tackle this problem effectively. For illustration, we benchmark four modified MRL models on our new datasets.
- North America > United States (0.14)
- Indian Ocean (0.04)
- Health & Medicine > Therapeutic Area (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Materials > Chemicals (0.93)
Graph Convolutional Neural Networks to Analyze Complex Carbohydrates
Graph convolutional neural networks (GCNs) have attracted increasing amounts of attention over the last couple of years, with more and more disciplines finding use for them. This has also been extended into the life sciences, as GCNs have been used to analyze proteins, drugs, and of course biological networks. One key advantage of GCNs that has enabled this expansion is their ability to natively work with nonlinear data formats, in contrast to more linear data structures such as in natural languages. Because of this feature, we also implemented GCNs for our own topic of interest, the study of complex carbohydrates or glycans. Glycans are ubiquitous in biology, decorating every cell and playing key roles in processes such as viral infection or tumor immune evasion.