accurate protein structure prediction
Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems
Zhang, Xuan, Wang, Limei, Helwig, Jacob, Luo, Youzhi, Fu, Cong, Xie, Yaochen, Liu, Meng, Lin, Yuchao, Xu, Zhao, Yan, Keqiang, Adams, Keir, Weiler, Maurice, Li, Xiner, Fu, Tianfan, Wang, Yucheng, Yu, Haiyang, Xie, YuQing, Fu, Xiang, Strasser, Alex, Xu, Shenglong, Liu, Yi, Du, Yuanqi, Saxton, Alexandra, Ling, Hongyi, Lawrence, Hannah, Stärk, Hannes, Gui, Shurui, Edwards, Carl, Gao, Nicholas, Ladera, Adriana, Wu, Tailin, Hofgard, Elyssa F., Tehrani, Aria Mansouri, Wang, Rui, Daigavane, Ameya, Bohde, Montgomery, Kurtin, Jerry, Huang, Qian, Phung, Tuong, Xu, Minkai, Joshi, Chaitanya K., Mathis, Simon V., Azizzadenesheli, Kamyar, Fang, Ada, Aspuru-Guzik, Alán, Bekkers, Erik, Bronstein, Michael, Zitnik, Marinka, Anandkumar, Anima, Ermon, Stefano, Liò, Pietro, Yu, Rose, Günnemann, Stephan, Leskovec, Jure, Ji, Heng, Sun, Jimeng, Barzilay, Regina, Jaakkola, Tommi, Coley, Connor W., Qian, Xiaoning, Qian, Xiaofeng, Smidt, Tess, Ji, Shuiwang
Advances in artificial intelligence (AI) are fueling a new paradigm of discoveries in natural sciences. Today, AI has started to advance natural sciences by improving, accelerating, and enabling our understanding of natural phenomena at a wide range of spatial and temporal scales, giving rise to a new area of research known as AI for science (AI4Science). Being an emerging research paradigm, AI4Science is unique in that it is an enormous and highly interdisciplinary area. Thus, a unified and technical treatment of this field is needed yet challenging. This work aims to provide a technically thorough account of a subarea of AI4Science; namely, AI for quantum, atomistic, and continuum systems. These areas aim at understanding the physical world from the subatomic (wavefunctions and electron density), atomic (molecules, proteins, materials, and interactions), to macro (fluids, climate, and subsurface) scales and form an important subarea of AI4Science. A unique advantage of focusing on these areas is that they largely share a common set of challenges, thereby allowing a unified and foundational treatment. A key common challenge is how to capture physics first principles, especially symmetries, in natural systems by deep learning methods. We provide an in-depth yet intuitive account of techniques to achieve equivariance to symmetry transformations. We also discuss other common technical challenges, including explainability, out-of-distribution generalization, knowledge transfer with foundation and large language models, and uncertainty quantification. To facilitate learning and education, we provide categorized lists of resources that we found to be useful. We strive to be thorough and unified and hope this initial effort may trigger more community interests and efforts to further advance AI4Science.
Artificial intelligence in structural biology is here to stay
"I didn't think we would get to this point in my lifetime." That's how one research leader in structural biology responded to last week's publication of research in which artificial intelligence (AI) was used to predict the structure of more than 20,000 human proteins, as well as that of nearly all the known proteins produced by 20 model organisms such as Escherichia coli, fruit flies and yeast, but also soya bean and Asian rice. That is a combined total of around 365,000 predictions1. The data, publicly accessible for the first time (see https://alphafold.ebi.ac.uk), were released online on 22 July by researchers at DeepMind, a London-based AI company owned by Google's parent company, Alphabet, and the European Bioinformatics Institute, based at the European Molecular Biology Laboratory (EBI-EMBL) near Cambridge, UK. DeepMind's AI predicts structures for a vast trove of proteins The DeepMind team developed a machine-learning tool called AlphaFold.
DeepMind and EMBL release database of predicted protein structures
T-cell immunomodulatory protein homolog, from the AlphaFold Protein Structure Database, reproduced under a CC-BY-4.0 license. DeepMind and the European Molecular Biology Laboratory (EMBL) have partnered to produce a database of predicted protein structure models. The first release covers all 20,000 proteins expressed in the human proteome, and the proteomes of 20 other biologically significant organisms, totalling over 350k structures. In the coming months they plan to expand the database to cover a large proportion of all catalogued proteins (the over 100 million in UniRef90). The data is freely and openly available to the scientific community. You can access the AlphaFold Protein Structure Database here.
Accurate Protein Structure Prediction by Embeddings and Deep Learning Representations
Drori, Iddo, Thaker, Darshan, Srivatsa, Arjun, Jeong, Daniel, Wang, Yueqi, Nan, Linyong, Wu, Fan, Leggas, Dimitri, Lei, Jinhao, Lu, Weiyi, Fu, Weilong, Gao, Yuan, Karri, Sashank, Kannan, Anand, Moretti, Antonio, AlQuraishi, Mohammed, Keasar, Chen, Pe'er, Itsik
Proteins are the major building blocks of life, and actuators of almost all chemical and biophysical events in living organisms. Their native structures in turn enable their biological functions which have a fundamental role in drug design. This motivates predicting the structure of a protein from its sequence of amino acids, a fundamental problem in computational biology. In this work, we demonstrate state-of-the-art protein structure prediction (PSP) results using embeddings and deep learning models for prediction of backbone atom distance matrices and torsion angles. We recover 3D coordinates of backbone atoms and reconstruct full atom protein by optimization. We create a new gold standard dataset of proteins which is comprehensive and easy to use. Our dataset consists of amino acid sequences, Q8 secondary structures, position specific scoring matrices, multiple sequence alignment co-evolutionary features, backbone atom distance matrices, torsion angles, and 3D coordinates. We evaluate the quality of our structure prediction by RMSD on the latest Critical Assessment of Techniques for Protein Structure Prediction (CASP) test data and demonstrate competitive results with the winning teams and AlphaFold in CASP13 and supersede the results of the winning teams in CASP12. We make our data, models, and code publicly available.