Taipale Group - Starting January 2025

Molecular mechanisms of gene control

We will explore two fundamental questions: What are the rules that specify how DNA sequence determines when and where genes are expressed? and What are the mechanisms that control how tissues and organisms grow? By answering the 'sequence to expression' problem and understanding mechanisms of growth control, we hope to power research to transform genome information into biological insights and medical benefits.

We will combine high-throughput wet-lab biological experiments, synthetic genomes,DNA sequencing with dry lab computation, and artificial intelligence to understand the molecular mechanisms that control gene activity. By applying our insights to large data sets (e.g. cancer genomes and genome-wide association studies),we will identify genomic regions associated with a wide variety of diseases, including heart disease, diabetes and different types of cancer.

Our approach

Our research will explore one the fundamental problems remaining in bioscience: the ‘Sequence to Expression’ conundrum. To understand the interplay of DNA, transcription factors and tissue-specific enhancer elements that regulate gene expression – the second genetic code – we will apply three complementary approaches:

biochemical – based on measuring affinities of DNA-binding proteins to all possible DNA sequences, followed by building physically realistic models of gene expression
machine-learning – based on learning the full sequence-to-activity map from large-scale functional genomics data
genome-sequence based computation – based on predicting protein structures and affinities from sequence, and gene expression from these parameters.

Overall, our approach will be to study individual protein-DNA interactions in the presence of nucleosomes and/or cellular extracts, and at the systems level, measuring activity of elements in a massively parallel fashion. We will then apply predictive computational modelling using both machine learning and interpretable physical models to analyse the data and to build a generative model of gene expression.

This approach will enable us to divide the ‘sequence to expression’ problem into a set of smaller questions that we will answer by:

coordinated, large-scale high-throughput experimentation
developing cutting-edge technologies for scalable data generation
creating and employing synthetic biology
automating our experiments to deliver at scale
developing advanced computational tools to analyse and decipher the resulting vast volume of genomic data.

Aims

Our work will seek to:

Determine binding specificities of transcription factors on free and nucleosomal DNA in the absence and presence of cytosine methylation.
Predict affinity of TFs to DNA based on their protein sequence.
Measure activity of TFs and regulatory elements in vivo.
Build models to understand transcriptional regulation based on the biochemical parameters.

Our people

Group lead

Prof Jussi Taipale

Incoming Group Leader - Joining in January 2025

I seek to solve one of the fundamental questions of biology: how do cells control which genes are switched on or off to enable the different tissues of the body to grow and develop? By discovering the switches, processes and feedback controls that govern the relationship between DNA sequence and gene expression I hope to provide the foundations for new medical diagnostics and treatments for a wide variety of diseases including cancer, heart disease and diabetes.

Related groups

Science group

Lehner Group

Programmable biology

We seek to lay the foundations for programmable biology. By combining genomics, biophysics, mechanistic modelling and artificial intelligence at scale, we ...

Science group

Parts Group

Understanding human DNA function by engineering

Our goal is to mechanistically understand impact of mutations in human DNA. To do so, we engineer DNA variation in cells, ...

Wellcome Sanger Institute

Programmes and Facilities

Programme

Generative and Synthetic Genomics

The Generative and Synthetic Genomics Programme combines large-scale data generation and artificial intelligence to lay the foundations for predictive and programmable ...

Careers and Study

Policies

Archive

Leadership

Faculty