Prof Jussi Taipale

Incoming Group Leader - Joining in January 2025

I seek to solve one of the fundamental questions of biology: how do cells control which genes are switched on or off to enable the different tissues of the body to grow and develop? By discovering the switches, processes and feedback controls that govern the relationship between DNA sequence and gene expression I hope to provide the foundations for new medical diagnostics and treatments for a wide variety of diseases including cancer, heart disease and diabetes.

Exploring the Second Genetic Code

I am fascinated by the ‘Sequence to Expression’ conundrum – one of the largest remaining problems in bioscience. We know that protein production in cells starts with the expression of genes into messenger RNA, and that this process is regulated by transcription factors binding to specific DNA sequences. However, humans have more than 1,600 transcription factors that act together to control gene expression, making the gene regulatory code far more complex than the genetic code that governs how RNA sequence is translated to protein sequence. Because of this complexity, we are unable to make full use of the wealth of genomic data being produced to make diagnoses or predictions.

Once we are able to understand this ‘second genetic code’ of transcription regulation and gene expression, we will be able to:

  • Better understand the mechanisms controlling cell growth in normal development and in uncontrolled growth (for example tumour formation).
  • Interpret cancer genomes and the genome-wide association study data of common, complex diseases (such as heart disease and diabetes).

Innovating to explore

To truly understand the second genetic code, we need to go beyond the basic DNA sequence and the first genetic code to decipher the intricately entwined language of regulatory elements and how they interact with transcription factors to form tissue-specific regulatory elements.

To address this, I and my team have developed and utilised a range of high-throughput experimental techniques such as HT-SELEX (High-Throughput Systematic evolution of ligands by exponential enrichment), ATI (Active Transcription Factor Identification) and genome editing tools to enable high-throughput genetic and genomic screening.

Our work is interdisciplinary and combines both wet-lab experimentation and dry-lab computational biology to provide insights. This includes the development of computational tools that can identify gene regulatory elements, and predict the effect of genetic variation on their activity.

Combining these approaches, we have explored the molecular mechanisms that control gene regulation and to characterise transcription factor binding sites and specificities in humans and a range of experimental models, including mouse and Drosophila.

Using AI and scale

At the Sanger Institute, we will be employing the Institute’s unique resources and capacities in high-throughput genetic and genomic experimentation, DNA sequencing, machine learning and computational analysis to decipher the regulatory genetic code and better understand the relationship between DNA and protein sequence and binding affinity.

Our approach will be three-fold:

  • Biochemical – measuring parameters that enable building of a biophysical model of gene expression.
  • Generative – employing generative genomics to create massively parallel reporter assays to measure activities of synthetic sequences in cell.
  • Computational – applying machine-learning algorithms to build a sequence-to-expression map.

Work with me

If you would like to partner with me or work in my team, please contact me.

My timeline

 

My publications

Loading publications...