John McGeehan, biologist and director of the Center for Enzyme Innovation in Portsmouth, England, has been looking for a molecule that could break down the 150 million tons of soda bottles and other plastic waste scattered around the world for several years.
Working with researchers on both sides of the Atlantic, he has found some good opportunities. But his job is that of the most demanding locksmith: to locate the chemical compounds that twist and fold by themselves into a microscopic shape that fits perfectly into the molecules of a plastic bottle and splits them apart like a key opens a door.
Determining the exact chemical content of a given enzyme is a fairly simple challenge these days. But identifying its three-dimensional shape can take years of biochemical experimentation. After reading last fall that an artificial intelligence laboratory in London called DeepMind had developed a system that automatically predicts the shapes of enzymes and other proteins, Dr. McGeehan asked the lab if it could help with his project.
At the end of a work week, he sent DeepMind a list of seven enzymes. The following Monday, the lab returned molds for all seven. “This has brought us a year, if not two years, ahead,” said Dr. McGeehan.
Now any biochemist can speed up their work in a similar way. On Thursday, DeepMind released the predicted shapes of more than 350,000 proteins – the microscopic mechanisms that control the behavior of bacteria, viruses, the human body and all other living things. This new database contains the three-dimensional structures of all proteins expressed in the human genome, as well as those of proteins found in 20 other organisms, including the mouse, fruit fly and E. coli bacterium.
This huge and detailed biological map – containing roughly 250,000 previously unknown forms – could improve the ability to understand diseases, develop new drugs, and reuse existing drugs. It can also lead to novel biological tools, such as an enzyme that efficiently breaks down plastic bottles and converts them into materials that can be easily reused and recycled.
“This can move you forward in time – affects the way you think about problems and help solve them faster,” said Gira Bhabha, assistant professor in the Department of Cell Biology at New York University. “Whether you’re studying neuroscience or immunology – whatever your specialty in biology – it can be useful.”
This new knowledge is a key of its own: if scientists can determine the shape of a protein, they can determine how other molecules bind to it. This could show, for example, how bacteria resist antibiotics – and how this resistance can be counteracted. Bacteria resist antibiotics by expressing certain proteins; If scientists could identify the shapes of these proteins, they could develop new antibiotics or new drugs that suppress them.
In the past, determining the shape of a protein took months, years, or even decades of trial-and-error experiments with x-rays, microscopes, and other bench-top tools. But DeepMind can significantly shorten the timeline with its AI technology known as AlphaFold.
When Dr. McGeehan sent DeepMind his list of seven enzymes, he told the lab that he had already identified shapes for two of them, but he didn’t say which two. In this way it was possible to test how well the system worked; AlphaFold passed the test and correctly predicted both forms.
More notably, said Dr. McGeehan that the predictions came true within a few days. He later learned that AlphaFold actually did the job in just a few hours.
AlphaFold predicts protein structures using what is known as a neural network, a mathematical system that can learn tasks by analyzing huge amounts of data – in this case thousands of known proteins and their physical forms – and extrapolating them into the unknown.
This is the same technology that recognizes the commands you type on your smartphone, recognizes faces in the photos you post on Facebook, and translates one language to another on Google Translate and other services. However, many experts believe that AlphaFold is one of the most powerful uses of the technology.
“It shows that AI can do useful things amidst the complexities of the real world,” said Jack Clark, one of the authors of the AI Index to track the advancement of artificial intelligence around the world.
Like Dr. McGeehan discovered it can be remarkably accurate. AlphaFold can predict the shape of a protein with an accuracy that can rival physical experiments 63 percent of the time, according to independent benchmark tests that compare its predictions to known protein structures. Most experts assumed that such a powerful technology was years away.
“I thought it would be another 10 years,” said Randy Read, a professor at the University of Cambridge. “That was a complete change.”
However, the accuracy of the system varies, so some of the predictions in the DeepMind database are less useful than others. Each prediction in the database comes with a “confidence level” that indicates how accurate it is likely to be. DeepMind researchers estimate that the system delivers a “good” prediction about 95 percent of the time.
This means that the system cannot completely replace physical experiments. It is used alongside bench-top work, helping scientists figure out which experiments to conduct and filling in the gaps when experiments are unsuccessful. With AlphaFold, researchers at the University of Colorado Boulder recently helped identify a protein structure that they had difficulty identifying for more than a decade.
DeepMind’s developers chose to freely share their database of protein structures instead of selling access in hopes of fueling advances in the biological sciences. “We’re interested in maximum impact,” said Demis Hassabis, CEO and co-founder of DeepMind, which is owned by the same parent company as Google but acts more like a research lab than a commercial company.
Some scientists have compared DeepMind’s new database to the Human Genome Project. The Human Genome Project, completed in 2003, provided a map of all human genes. Now DeepMind has provided a map of the roughly 20,000 proteins expressed by the human genome – another step in understanding how our bodies work and responding to mistakes.
The hope is also that the technology will continue to develop. A University of Washington lab built a similar system called RoseTTAFold, and like DeepMind, it has openly shared the computer code that powers its system. Anyone can use the technology and anyone can work to improve it.
Even before DeepMind began to openly share its technology and data, AlphaFold was feeding a multitude of projects. Researchers at the University of Colorado are using the technology to understand how bacteria such as E. coli and Salmonella develop resistance to antibiotics and to find ways to combat this resistance. At the University of California at San Francisco, researchers used the tool to improve their understanding of the coronavirus.
The coronavirus causes devastating consequences in the body with 26 different proteins. With the help of AlphaFold, researchers have improved their understanding of one key protein and hope the technology can help improve their understanding of the other 25.
If this comes too late to affect the current pandemic, it could help prepare for the next. “A better understanding of these proteins will help us target not only this virus, but other viruses as well,” said Kliment Verba, one of the researchers in San Francisco.
The possibilities are many. After DeepMind Dr. McGeehan had given molds for seven enzymes that could potentially rid the world of plastic waste, he sent the lab a list of 93 others. “You’re working on it now,” he said.