Wellcome Sanger Institute

Protein behaviour can be predicted with simple maths

There are more variations of a small protein than there are atoms in the entire universe. Despite the vast possibilities, new research shows predicting mutation effects on protein behaviour is much simpler than previously thought.

Email newsletter

News and blog updates

Sign up

Mutations affecting protein stability follow remarkably simple rules, finds new research published in Nature (25 September).

The discovery could simplify how we predict protein behaviour, making it easier and faster to understand the molecular basis of diseases, develop drugs and design new proteins for industrial applications.

Researchers from the Wellcome Sanger Institute and the Centre for Genomic Regulation, Barcelona created thousands of protein variants by introducing different mutations. They tested how these changes affected protein stability and found that despite billions of possible mutation combinations, most follow predictable patterns, meaning complex models are not needed to predict their effects.

Proteins, the building blocks of life, are chains made up of 20 different types of smaller units called amino acids. Even small changes in their structure can lead to diseases such as Alzheimer’s, cystic fibrosis or cancer. As proteins get longer, the number of possible mutation combinations increases exponentially. For instance, a protein 100 amino acids long has more potential combinations than there are atoms in the entire universe1.

Until now, scientists believed that the complex interactions of multiple mutations made predicting the impact of these changes nearly impossible without intricate models.

In this new study, researchers from the Wellcome Sanger Institute and the Centre for Genomic Regulation, Barcelona created thousands of versions of a protein by swapping out one or more of its amino acids. They measured how each mutation and combination of mutations affected how well the protein held its shape. The team found that most mutations acted independently, meaning their combined effects could be easily predicted, without the need for supercomputers or complex algorithms2.

The new findings could help doctors better understand and treat genetic diseases caused by multiple mutations in the same protein. By predicting how different combinations of mutations affect stability, clinicians can make more accurate diagnoses and develop personalised treatment plans. It could also advance industrial applications such as designing custom proteins for drug production or enzymes to break down plastics.

The findings may also lead to more efficient drug development. Some drugs, such as those used in Alzheimer’s disease, work by stabilising misfolded proteins. Researchers can now better identify which mutations are most destabilising and design molecules that counteract them.

While the findings can dramatically reduce the number of experiments needed, some level of experimental validation will still be necessary to confirm predictions, especially for critical applications like drug development where there may be unforeseen effects or rare interactions that the models do not capture.

“There are 17 billion different combinations of a protein that is 34 amino acids in length with only a single change allowed at each position. If it took just one second to test a single combination, we would need a total of 539 years to try them all. It is not a feasible experiment. This work opens up exciting new possibilities for protein design. For example, we could create enzymes that work faster and are more stable, which could be incredibly useful in areas like medicine and environmental protection.”

Dr Aina Martí Aranda, author of the study formerly at the Centre for Genomic Regulation, Barcelona and now at the Wellcome Sanger Institute

“Our discovery turns an old understanding on its head, showing that the endless possibilities of protein mutations boil down to straightforward rules. We don’t need supercomputers to predict a protein’s behaviour – just good measurements and simple maths will do.”

Professor Ben Lehner, senior author of the study at the Wellcome Sanger Institute

More information

Notes to Editors:

This web story was adapted from the Centre for Genomic Regulation, Barcelona.

1. As proteins get longer, the different number of combinations rises exponentially. There are approximately 1.27×10130 possible combinations for a protein that is 100 amino acids long. The vast majority of known proteins, especially those contributing to human disease, are much longer than 100 amino acids long. With so many possibilities, experimentally testing each one to understand its impact on protein behaviour is practically impossible.

2. While the discovery is a significant advance, the researchers did not capture more complex interactions involving three or more mutations.

These data can be accessed here: https://github.com/lehner-lab/archstabms

Publication:

A.J. Faure et al. (2024) ‘The genetic architecture of protein stability.’ Nature. DOI: 10.1038/s41586-024-07966-0

Funding:

This research was supported by the European Research Council and the Spanish Ministry of Science and Innovation. For full funding acknowledgements, please refer to the publication.