Taipale Group - Starting January 2025
Molecular mechanisms of gene control
We will combine high-throughput wet-lab biological experiments, synthetic genomes,DNA sequencing with dry lab computation, and artificial intelligence to understand the molecular mechanisms that control gene activity. By applying our insights to large data sets (e.g. cancer genomes and genome-wide association studies),we will identify genomic regions associated with a wide variety of diseases, including heart disease, diabetes and different types of cancer.
Our approach
Our research will explore one the fundamental problems remaining in bioscience: the ‘Sequence to Expression’ conundrum. To understand the interplay of DNA, transcription factors and tissue-specific enhancer elements that regulate gene expression – the second genetic code – we will apply three complementary approaches:
- biochemical – based on measuring affinities of DNA-binding proteins to all possible DNA sequences, followed by building physically realistic models of gene expression
- machine-learning – based on learning the full sequence-to-activity map from large-scale functional genomics data
- genome-sequence based computation – based on predicting protein structures and affinities from sequence, and gene expression from these parameters.
Overall, our approach will be to study individual protein-DNA interactions in the presence of nucleosomes and/or cellular extracts, and at the systems level, measuring activity of elements in a massively parallel fashion. We will then apply predictive computational modelling using both machine learning and interpretable physical models to analyse the data and to build a generative model of gene expression.
This approach will enable us to divide the ‘sequence to expression’ problem into a set of smaller questions that we will answer by:
- coordinated, large-scale high-throughput experimentation
- developing cutting-edge technologies for scalable data generation
- creating and employing synthetic biology
- automating our experiments to deliver at scale
- developing advanced computational tools to analyse and decipher the resulting vast volume of genomic data.
Aims
Our work will seek to:
- Determine binding specificities of transcription factors on free and nucleosomal DNA in the absence and presence of cytosine methylation.
- Predict affinity of TFs to DNA based on their protein sequence.
- Measure activity of TFs and regulatory elements in vivo.
- Build models to understand transcriptional regulation based on the biochemical parameters.