David Baker (Seattle, Washington, United States, 1962), with a PhD in Biochemistry from the University of California, Berkeley, is currently the Director of the Institute for Protein Design, a Howard Hughes Medical Institute Investigator, the Henrietta and Aubrey Davis Endowed Professor in Biochemistry, and an adjunct professor of genome sciences, bioengineering, chemical engineering, computer science, and physics at the University of Washington. Author of more than 570 research papers – with over 142,000 citations and an h-index of 201 – he holds more than 100 patents, has co-founded 11 firms and is the Director of Rosetta Commons, a consortium of labs and researchers that develop biomolecular structure prediction and design software.
Baker – a Professor of Biochemistry at the University of Washington and a Howard Hughes Medical Institute Investigator – developed the RoseTTAFold program, while Hassabis and Jumper – CEO and senior research scientist respectively at AI company DeepMind – are the creators of AlphaFold2. “Both computing methods,” the committee explains, “rely on a sophisticated machine-learning technique known as deep learning to predict the shape of proteins with unprecedented accuracy, similar to that of experimentally-determined structures, and with exceptional speed.”
“This breakthrough,” it concludes, “is revolutionizing our understanding of how the amino acid sequence of proteins leads to uniquely ordered three-dimensional structures. Scientists are now using these new methods to predict protein conformations, design entirely new proteins and identify novel drug targets.”
“Until now,” said committee secretary Óscar Marín, “it took years of arduous lab work to predict the structure of even a single protein, but with the advances achieved by the three awardees we now need just a few minutes on the computer.” For the Director of the Medical Research Council Centre for Neurodevelopmental Disorders at King’s College London, thanks to the work done by Baker, Hassabis and Jumper “we are going to make far faster progress in future in developing treatments for multiple diseases.”
A technological “shortcut” to predict the structure of proteins
The DNA of our cells contains all the instructions we need to develop, survive and reproduce. But proteins are the workhorses that keep all keeping all these functions going, and it is their three-dimensional structure that determines their exact mission.
To know the specific role a protein fulfils, it is not enough to know the DNA sequence encoding it, or even to identify the amino acid sequence into which the genetic information is translated. The key to understanding how a protein will act lies in the arrangement in space it adopts through folding, but deciphering this in the lab is a slow and rather scattergun process. And predicting its function from its chemical composition is likewise a complex and uncertain task.
“Scientists always assumed that it was just too hard to understand how proteins fold. To try and deduce it from the underlying physical principles, you need a vast quantity of computing resources to even guess at their most stable form,” explained Dario Alessi, a committee member and head of the MRC Protein Phosphorylation and Ubiquitylation Unit at Dundee University (United Kingdom), shortly after the decision was reached. “But the awardees have come up with an AI-driven shortcut using a deep-learning technique.”
“I believe AlphaFold represents really the first powerful example of how deep learning is able to capture the complexity of biological systems and really develop mathematical understandings of extraordinarily complex things,” declared Jumper in an interview granted after hearing of the award. “It is very, very difficult to handle the extraordinary complexity that you see in a living cell, but I think with this technology we can really capture that complexity.”
“AlphaFold has already made a huge impact on biological research in quite a short space of time,” adds fellow laureate Demis Hassabis. “We know that over a million researchers have used the structures predicted by AlphaFold in their research, and pretty much every pharma company in the world has been using AlphaFold in their drug discovery programs.”
“De novo” proteins to block viruses and cancer cells
As well as predicting how naturally-occurring proteins will fold, the RoseTTAFold program led by David Baker has also proved able to design completely new proteins based on a simple description of their target functions. The program can thus obtain proteins to block not only flu virus or COVID-19 proteins, but also cancer cells, and its results have been successfully tested in the lab.
“New proteins can be improved medicines, so there are many new and exciting medical applications, for example, creating new vaccines or new cancer treating medications,” Baker explains. Some decades back, this American biochemist and computational biologist began exploring ways to deduce the structure of proteins guided by the principles of physics, and wrote his findings into an algorithm known by the name Rosetta. The new method performed fairly well with small proteins but demanded large computational resources and expert knowledge to get it working properly.
In parallel, Demis Hassabis and John Jumper decided to use artificial intelligence to solve the problem in a quicker, more accessible way. Jumper led a team using available deep-learning tools and vast quantities of data on the sequences and structures of known proteins, and set to work training the neural network.
This first iteration, which they called AlphaFold, was launched in 2018. “We had the best system in the world at the time,” says Jumper, “but it was still far, far off from what we knew was the kind of accuracy needed to be really experimentally relevant.”
They accordingly set to work to design a better system. Starting from scratch, they decided to take all the knowledge they possessed on how proteins fold and feed it into the neural network. So as well as the information provided by known proteins, the network also had some knowledge about the folding mechanism built into its design.
“This enabled the network to learn dramatically more efficiently from the existing data,” Jumper affirms. In December 2020 they entered the new tool, AlphaFold2, for an international challenge where it would have to prove itself against competing systems. Their resounding success went far beyond the researchers’ expectations. AlphaFold2 achieved in a few short days what would have taken years of work in the lab.
When announcing AlphaFold2, Jumper had outlined some of its underlying concepts, and Baker was quick to take note. “We started having meetings every week in my group,” he recalls, “and we started to systematically go through different ideas and experimenting, and that ultimately led to RoseTTAFold.”
The product was launched a few months later. The level of accuracy was comparable to that of AlphaFold2, plus it came with an added functionality. Not only could it reliably predict a protein’s structure from its amino acid sequence in hours or even minutes, it could also run the process in reverse, determining the corresponding amino acid sequence from a protein of a given shape.
Open source tools for the biomedical research community
Nowadays both RoseTTAFold and AlphaFold2 are freely available to the scientific community, and recent upgrades have practically equalized the computing times required by each.
Although these AI tools have not entirely supplanted experimental methods, they have made a strong appearance at their side, revolutionizing the whole of biology. So much so that Dario Alessi describes them as “the first real demonstration of how artificial intelligence will transform the field.”
He recalls that his own laboratory had spent three years unraveling the structure of the PPM1H protein through experimental techniques when AlphaFold came along. “We had the structure and were just about to publish it when AlphaFold appeared. Out of curiosity we compared the structures and they were totally identical, not a single significant difference in 547 amino acids,” he relates, still astounded at the program accomplishing in minutes what had taken years of work.
Thanks to these tools, almost all documented proteins – not only human but those of animals, plants and even bacteria – have yielded up their structural secrets. And this knowledge will find immediate application in the creation of new drugs and vaccines.
“We have already seen AlphaFold being applied to a huge range of problems,” says Hassabis. “Some of the things we’re most excited about it being used for are drug discovery, for example, to combat antibiotic resistance, or to try and find cures for diseases like malaria.”
Jumper, in fact, has collaborated with a University of Oxford research group working on a malaria vaccine. Most vaccines contain fragments of the protein of the infectious agent, but to decide which fragment is best, you need to know the structure of the candidate protein. The Oxford team, says Jumper, “were unsure about the structure of the protein they needed, and this was stopping them from figuring out the right construct. They used AlphaFold to predict the structure, so were able to understand which fragments might work and how to make a vaccine from them.”
Computational biologist Gonzalo Jiménez Osés, Principal Investigator at CIC bioGUNE in Bilbao and one of the nominators of the new laureates, explains one of the most promising facets of this contribution in the biomedicine area: “Among AlphaFold’s successes has been to integrate the vast amount of genetic and structural information contributed by scientists over the decades to open access databanks into an advanced neural network together with a sophisticated machine-learning algorithm, and one immediate byproduct will be in new drug design. In classic drug development, we will certainly discover novel therapeutic targets, but, more important still, we will rapidly arrive at a more precise understanding of the network of protein interactions occurring in diseases such as cancer and immune system disorders, and this will lead to new treatments, because computer simulations of these complex processes will be far more reliable.”
The revolution in purpose-designed proteins for more sophisticated medications
For the moment, the biggest impact for new vaccine and drug creation lies in the design of proteins à la carte. The latest RoseTTAFold version even allows us to create proteins from simple descriptions. “It’s like DALL-E but for proteins,” Baker explains, referring to the AI system where users can generate images from simple text prompts. “So for example, you can tell RoseTTAFold: design a protein which blocks this flu virus protein, or design a protein which will block these cancer cells. RoseTTAFold will then make those proteins. We’ve made them in the lab, and we find that they have exactly those functions.”
An anti-coronavirus vaccine created with RoseTTAFold is now being used in South Korea. And new purpose-designed anti-cancer medicines are being tested in human clinical trials. There are even plans to develop a nasal spray that protects against COVID and other respiratory viruses.
“We believe that almost all of medicine will be transformed by the protein design revolution,” says Baker. “Most medicines today are made by making small modifications to the proteins which already exist in nature. Now that we can design completely new proteins, we can develop much more improved, more sophisticated medicines that, for example, can treat cancer without the side effects, be made very quickly upon the outbreak of a new pandemic, and in general will be more precise and more robust.”