The story so far: DeepMind, a company based in London and owned by Google, announced this week that it had predicted the three-dimensional structures of more than 200 million proteins using AlphaFold. This is the entire protein universe known to scientists today.
AlphaFold is an AI-based protein structure prediction tool. It is based on a computer system called deep neural network. Inspired by the human brain, neural networks use a large amount of input data and provides the desired output exactly like how a human brain would. The real work is done by the black box between the input and the output layers, called the hidden networks. AlphaFold is fed with protein sequences as input. When protein sequences enter through one end, the predicted three-dimensional structures come out through the other. It is like a magician pulling a rabbit out of a hat.
It uses processes based on “training, learning, retraining and relearning.” The first step uses the available structures of 1,70,000 proteins in the Protein Data Bank (PDB) to train the computer model. Then, it uses the results of that training to learn the structural predictions of proteins not in the PDB. Once that is done, it uses the high-accuracy predictions from the first step to retrain and relearn to gain higher accuracy of the earlier predictions. By using this method, AlphaFold has now predicted the structures of the entire 214 million unique protein sequences deposited in the Universal Protein Resource (UniProt) database.
Proteins are the business ends of biology, meaning proteins carry out all the functions inside a living cell. Therefore, knowing protein structure and function is essential to understanding human diseases. Scientists predict protein structures using x-ray crystallography, nuclear magnetic resonance spectroscopy, or cryogenic electron microscopy. These techniques are not just time-consuming, they often take years and are based mainly on trial-and-error methods. The development of AlphaFold changes all of that. It is a watershed movement in science and structural biology in particular.
AlphaFold has already helped hundreds of scientists accelerate their discoveries in vaccine and drug development since the first public release of the database nearly a year back.
From the seminal contribution of G. N. Ramachandran in understanding protein structures to the present day, India is no stranger to the field and has produced some fine structural biologists. The Indian community of structural biology is strong and skilled. It needs to quickly take advantage of the AlphaFold database and learn how to use the structures to design better vaccines and drugs. This is especially important in the present context. Understanding the accurate structures of COVID-19 virus proteins in days rather than years will accelerate vaccine and drug development against the virus.
India will also need to speed up its implementation of public-private partnerships in the sciences.
The public-private partnership between the European Molecular Biology Laboratory’s European Bioinformatics Institute and DeepMind made the 25-terabyte AlphaFold dataset accessible to everyone in the scientific community at no cost.
Learning from this, India could facilitate joint collaborations with the prevalent hardware muscle and data science talent in the private sector and specialists in academic institutions to pave the way for data science innovations.
Although a tour-de-force in structural biology, like any other method, AlphaFold is neither flawless nor the only AI-based protein structure prediction tool. RoseTTaFold, developed by David Baker at the University of Washington in Seattle, U.S., is another tool. Although less accurate than AlphaFold, it can predict the structure of protein complexes.
The development of AlphaFold is sure to make many scientists feel vulnerable, especially when they compare their efforts from years of hard work in the lab to that of a computer system. However, this is the time to adjust and take advantage of the new reality.
Doing this will reinvigorate scientific research and accelerate discovery.
Binay Panda is a Professor at Jawaharlal Nehru University, New Delhi