The machine-learning tool could help researchers discover entirely new proteins not yet known to science.
UNIVERSITY OF WASHINGTON
A new AI tool could help researchers discover previously unknown proteins and design entirely new ones. When harnessed, it could help unlock the development of more efficient vaccines, speed up research for the cure to cancer, or lead to completely new materials.
Alphabet-owned AI lab DeepMind took the world by surprise in 2020 when it announced AlphaFold, an AI tool that used deep learning to solve one of the “grand challenges” of biology: accurately predicting the shapes of proteins. Proteins are fundamental to life, and understanding their shape is vital to working with them. Earlier this summer DeepMind announced that AlphaFold could now predict the shapes of all proteins known to science.
The new tool, ProteinMPNN, described by a group of researchers from the University of Washington in two papers published in Science today (available here and here), offers a powerful complement to that technology.
The papers are the latest example of how deep learning is revolutionizing protein design by giving scientists new research tools. Traditionally researchers engineer proteins by tweaking those that occur in nature, but ProteinMPNN will open an entire new universe of possible proteins for researchers to design from scratch.
“In nature, proteins solve basically all the problems of life, ranging from harvesting energy from sunlight to making molecules. Everything in biology happens from proteins,” says David Baker, one of the scientists behind the paper and director of the Institute for Protein Design at the University of Washington.
“They evolved over the course of evolution to solve the problems that organisms faced during evolution. But we face new problems today, like covid. If we could design proteins that were as good at solving new problems as the ones that evolved during evolution are at solving old problems, it would be really, really powerful.”
Proteins consist of hundreds to thousands of amino acids that are linked up in long chains, which then fold into three-dimensional shapes. AlphaFold helps researchers predict the resulting structure, offering insight into how they will behave.
ProteinMPNN will help researchers with the inverse problem. If they already have an exact protein structure in mind, it will help them find the amino acid sequence that folds into that shape. The system uses a neural network trained on a very large number of examples of amino acid sequences, which fold into three-dimensional structures.
But researchers also need to solve another issue. To design proteins that are useful for real-world applications, such as a new enzyme that digests plastic, they first have to figure out what protein backbone would have that function.
To do that, researchers in Baker’s lab use two machine-learning methods, detailed in an article in Science last July, that the team calls “constrained hallucination” and “in painting.”
Large language models are trained on troves of personal data hoovered from the internet. So I wanted to know: What does it have on me?
“Constrained hallucination” lets users do a random search among all possible protein sequences and favor sequences with certain functions. This “hallucination” makes it possible to explore the space of all possible protein structures, thanks to machine learning’s ability to crunch vast data sets. There are 20 amino acids, which can be combined into a massive number of possible sequences.
“Nature has only sampled … a tiny fraction. So if you limited the search to those sequences that exist in nature, you wouldn’t get anywhere,” Baker says.
“In painting” works much like autocomplete in a word processor, but for protein structures and sequences. Using these methods, the researchers can create a completely new protein that hasn’t been seen in nature before, such as a giant ring-like structure.
Baker’s team is experimenting with whether those ring-like structures could be used as components of tiny machines that operate at the nanoscale. In the future, these nanomachines could be used to unclog arteries, for example.
The ability to use machine learning to design proteins in this way is “a very big deal,” says Lynne Regan, professor of biochemistry and biotechnology at the University of Edinburgh.
Machine learning will make the whole process a lot quicker and easier, and will allow researchers to create completely new proteins and structures on a much larger scale. The software is more than 200 times faster than the previous best tool and requires minimal user input, potentially lowering the barriers to entry for protein design.
“These contributions and others recently are transforming the field of biomolecular structure prediction and design,” says Jeffrey Gray, a professor of chemical and biomolecular engineering at Johns Hopkins University.
“The implications are dramatic in terms of understanding biology, health, and disease and in designing new molecules to reduce human suffering,” Gray says.
Gray says his lab will combine deep-learning tools they developed with ones from the Baker lab to better understand the immune system and immune-related diseases, and use AI to design therapeutics.
“AlphaFold launched biology into a new era by solving the protein structure predicting problem and demonstrating the transformative role that AI and [machine learning] will play in biology,” says Pushmeet Kohli, the head of DeepMind’s AI for Science team. “ProteinMPNN is another proof of this paradigm shift, designing proteins for specific tasks.”
ProteinMPNN, which is now available free on the open-source software repository GitHub, will give researchers the tools to make unlimited new designs. “The challenge, of course … is what are you going to design?” Baker says.
Correction: A previous version of this story stated that proteins consist ofhundreds of thousands of amino acids, when in fact they consist of hundreds tothousands of amino acids. Sorry.