Taipale Group - Starting January 2025

Molecular mechanisms of gene control

We will explore two fundamental questions: What are the rules that specify how DNA sequence determines when and where genes are expressed? and What are the mechanisms that control how tissues and organisms grow? By answering the 'sequence to expression' problem and understanding mechanisms of growth control, we hope to power research to transform genome information into biological insights and medical benefits.

We will combine high-throughput wet-lab biological experiments, synthetic genomes,DNA sequencing with dry lab computation, and artificial intelligence to understand the molecular mechanisms that control gene activity. By applying our insights to large data sets (e.g. cancer genomes and genome-wide association studies),we will identify genomic regions associated with a wide variety of diseases, including heart disease, diabetes and different types of cancer.

Our approach

Our research will explore one the fundamental problems remaining in bioscience: the ‘Sequence to Expression’ conundrum. To understand the interplay of DNA, transcription factors and tissue-specific enhancer elements that regulate gene expression – the second genetic code – we will apply three complementary approaches:

  • biochemical – based on measuring affinities of DNA-binding proteins to all possible DNA sequences, followed by building physically realistic models of gene expression
  • machine-learning – based on learning the full sequence-to-activity map from large-scale functional genomics data
  • genome-sequence based computation – based on predicting protein structures and affinities from sequence, and gene expression from these parameters.

Overall, our approach will be to study individual protein-DNA interactions in the presence of nucleosomes and/or cellular extracts, and at the systems level, measuring activity of elements in a massively parallel fashion. We will then apply predictive computational modelling using both machine learning and interpretable physical models to analyse the data and to build a generative model of gene expression.

This approach will enable us to divide the ‘sequence to expression’ problem into a set of smaller questions that we will answer by:

  • coordinated, large-scale high-throughput experimentation
  • developing cutting-edge technologies for scalable data generation
  • creating and employing synthetic biology
  • automating our experiments to deliver at scale
  • developing advanced computational tools to analyse and decipher the resulting vast volume of genomic data.

Aims

Our work will seek to:

  • Determine binding specificities of transcription factors on free and nucleosomal DNA in the absence and presence of cytosine methylation.
  • Predict affinity of TFs to DNA based on their protein sequence.
  • Measure activity of TFs and regulatory elements in vivo.
  • Build models to understand transcriptional regulation based on the biochemical parameters.