New research further translates the language of the genome

Research into transcription factors deepen understanding of the ‘language’ of the genome, offering insights into human development.

Email newsletter

News and blog updates

Sign up

Listen to this news story:
Listen to “New research further translates the language of the genome” on Spreaker.

New research has uncovered more about the complexity of human gene regulation by identifying certain sequences of proteins called transcription factors that bind to DNA and regulate the expression of human genes.

Published today (9 April) in Nature, researchers from the Wellcome Sanger Institute, the University of Cambridge and their collaborators explored how DNA-guided transcription factors interact with each other.

This research adds to the groundwork of understanding the complex language of the gene regulatory code, and how DNA sequence patterns located close to our genes influence human development and disease risk.

Each gene has a regulatory region that contains instructions on when and where the gene is expressed. This information is written in a code that is read by transcription factors, which bind to specific DNA sequences and either increase or decrease the gene’s expression.

Previous research has explored the ‘language’ of the genome — the regulatory code that controls gene expression. It found that cooperation between multiple transcription factors is a key feature of transcription factor-DNA binding, with DNA actively facilitating interactions between various transcription factors.1 With the regulatory code being far more complex than the genetic code, which explains how DNA sequence determines the structure of proteins, researchers are aiming to understand the regulatory language in more detail, focusing on the ‘words’ and ‘grammar’ — such as the transcription factors — that influence when and where genes are expressed.

This deeper understanding is crucial for uncovering how cells develop into specific types, how organs form and where they are located in the body during embryonic development, and for understanding what goes wrong in disease.

The interactions between transcription factors guided by DNA are poorly understood. In a new study, researchers from the Sanger Institute and the University of Cambridge used two novel algorithms to analyse 58,000 pairs of transcription factors from human cells. They did this to identify how and where transcription factors interact with each other to bolster their understanding of the genomic language.2

The researchers’ results reveal new patterns and preferences in how certain transcription factors interact with each other – also known as ‘motifs’. In this study, the researchers estimate that they identified between 18 and 47 per cent of all human transcription factor pair motifs, greatly adding to their understanding of the regulatory code.

The team found that certain motifs they identified are present in developmental enhancers – DNA regulatory elements that activate transcription of a gene – that control important stages such as development of fingers.  For example, the research notes that certain sequences of transcription factor motifs, or ‘words’ in the language, influence whether or not someone develops polydactyly – too many fingers – or syndactyly – a fusion of fingers.

The findings also have implications for how scientists will use computational models – such as artificial intelligence – to predict protein structures in the future. Whilst these tools can predict the overall structure, they often cannot look into smaller details, such as how transcription factors interact with each other on DNA. These small interactions can have a big impact on human development, but computational models cannot always predict this. The researchers hope that future models will be able to incorporate the more minute transcription factor details to better predict protein structure and protein-DNA interactions.

This research marks a step forward in studying the smaller ‘words’ in the language of gene expression. By identifying small but key motifs in the genome, this research will help scientists understand and interpret the mechanisms influenced by transcription factors, particularly in the non-coding regions of the genome. These regions – which make up 99 per cent of the genome – do not code for proteins but still play a significant role in regulation of gene expression, and risk for development of disease.

 

“By gaining a deeper understanding of how transcription factors interact when guided by DNA, we hope our research will shed light on the molecular basis of the regulatory code, particularly in the context of developmental disorders. These interactions are evolutionarily conserved across mammals and offer several advantages in development, from incorporating positional information to creating sharper gene expression responses. With advanced insights into the regulatory code, we are excited to help drive future research that will improve our understanding of human development and developmental disorders.”

Dr Ilya Sokolov, an author of the study at the Wellcome Sanger Institute

“The human genome’s regulatory code is very complex, far more complex than the genetic code, and this research into transcription factor interactions unlocks deeper insights into the ‘language’ of the genome. Not only does our study provide more information into patterns of human development but it paves the way for future work with computational models that can hopefully incorporate these new data to better understand gene regulation.”

Professor Jussi Taipale, senior author of the study and Group Leader at the Wellcome Sanger Institute

More information

Notes to Editors

  1. Arttu Jolma, Yimeng Yin, Kazuhiro R. Nitta, Kashyap Dave, Alexander Popov, Minna Taipale, Martin Enge, Teemu Kivioja, Ekaterina Morgunova, Jussi Taipale. (2015) ‘DNA-dependent formation of transcription factor pairs alters their binding specificity.’Nature. DOI: 10.1038/nature15518
  2. The researchers expressed a set of human transcription factors — enriched in proteins that  are conserved in mammals  — in Escherichia coli, combined them into a total of 58,754  transcription factor (TF) pairs and analysed their interactions by CAP-SELEX – consecutive affinity purification evolution of ligands by exponential enrichment. CAP-SELEX is a method which enables the discovery of TF-TF-DNA binding preferences.

Publication

Zhiyuan Xie et al. (2025) ‘DNA-guided transcription factor interactions extend human gene regulatory code.’ Nature. DOI: 10.1038/s41586-025-08844-z

Funding

The research was part funded by Wellcome and the University of Cambridge. A full list of funders and acknowledgements can be found in the publication.