Scientists on Thursday unveiled the most exhaustive database yet of the proteins that form the building blocks of life, in a breakthrough observers said would "fundamentally change biological research". Every cell in every living organism is triggered to perform its function by proteins that deliver constant instructions to maintain health and ward off infection. Unlike the genome -- the complete sequence of human genes that encode cellular life -- the human proteome is constantly changing in response to genetic instructions and environmental stimuli. Understanding how proteins operate -- the shape in which they end up, or "fold" into -- within cells has fascinated scientists for decades. But determining each protein's precise function through direct experimentation is painstaking.
Protein structures to represent the data obtained via AlphaFold. DeepMind and EMBL release the most complete database of predicted 3D structures of human proteins. Partners use AlphaFold, the AI system recognized last year as a solution to the protein structure prediction problem, to release more than 350,000 protein structure predictions including the entire human proteome to the scientific community. DeepMind today announced its partnership with the European Molecular Biology Laboratory (EMBL), Europe's flagship laboratory for the life sciences, to make the most complete and accurate database yet of predicted protein structure models for the human proteome. This will cover all 20,000 proteins expressed by the human genome, and the data will be freely and openly available to the scientific community.
DeepMind's AlphaFold represents the first time a significant scientific problem has been solved by ... [ ] AI. It can be difficult to distinguish between substance and hype in the field of artificial intelligence. In order to stay grounded, it is important to step back from time to time and ask a simple question: what has AI actually accomplished or enabled that makes a difference in the real world? This summer, DeepMind delivered the strongest answer yet to that question in the decades-long history of AI research: AlphaFold, a software platform that will revolutionize our understanding of biology. In 1972, in his acceptance speech for the Nobel Prize in Chemistry, Christian Anfinsen made a historic prediction: it should in principle be possible to determine a protein's three-dimensional shape based solely on the one-dimensional string of molecules that comprise it. Finding a solution to this puzzle, known as the "protein folding problem," has stood as a grand challenge in the field of biology for half a century.
Last month, DeepMind published the much anticipated, detailed methodology underlying the latest version of AlphaFold – the UK-based science company's powerful AI system that blew away its rivals in the latest major competition to predict the 3D structure of proteins. AlphaFold's machine learning methodology has been applied to predict structures for almost 99% of human proteins which have now been made publicly available. In this long read, I reflect on the significance of these developments for fundamental research and drug discovery. I wrote this as the ICR celebrates the 10th anniversary of its AI-enabled drug discovery knowledgebase canSAR – which features multiple approaches to predicting'druggability' as an aid to selecting drug targets and accelerating drug discovery. The coronavirus pandemic has, understandably, soaked up a lot of bandwidth when it comes to science news – but one particular non-Covid science story was able to cut through and hit the headlines in the UK and around the world. On 30 November 2020 it was announced that DeepMind – a subsidiary of Google's parent company Alphabet focusing on artificial intelligence – had made what was hailed as a huge leap towards solving one of biology's greatest remaining challenges: the ability to predict the correct, three-dimensional structures of proteins based on their constituent, one-dimensional amino acid sequences. The announcement attracted huge interest, but the expert community has been waiting for the peer-reviewed science publication. The AI methodology has now been published in the leading journal Nature and this was followed rapidly by a second Nature paper from DeepMind and collaborators at the European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), which reports the application of the most recent AlphaFold machine learning system to predict the 3D structures at scale for almost the entire human proteome – 98.5% of human proteins.
The human mediator complex has long been one of the most challenging multi-protein systems for structural biologists to understand.Credit: Yuan He The human genome holds the instructions for more than 20,000 proteins. But only about one-third of those have had their 3D structures determined experimentally. And in many cases, those structures are only partially known. Now, a transformative artificial intelligence (AI) tool called AlphaFold, which has been developed by Google's sister company DeepMind in London, has predicted the structure of nearly the entire human proteome (the full complement of proteins expressed by an organism). In addition, the tool has predicted almost complete proteomes for various other organisms, ranging from mice and maize (corn) to the malaria parasite (see'Folding options').