Tree of Life Informatics Infrastructure

Tree of Life Programme

The Informatics Infrastructure team provides support for the production of reference genome assemblies and large-scale genome analyses in the Tree of Life programme, and helps with the management and use of IT resources.

The Tree of Life projects will generate tens of thousands of high-quality genomes over the coming years – more than have ever been sequenced! It is a challenging and extremely exciting task that will shape the future of biology, and the team’s role is to provide the platform for assembling and analysing those genomes at an unprecedented scale. We are the interface between the Tree of Life teams (assembly production and faculty research) and Sanger’s IT teams, working together with the informatics teams of the other programmes.

The team is organised in three poles.

Data management

Our data curators and managers maintain the integrity, consistency, and quality, or multiple databases used in production, including Genomes on a Tree (GoaT), Sample Tracking System (STS), Collaborative Open Plant Omics (COPO), and BioSamples.

Bioinformatics

Our bioinformaticians develop the suite of analysis pipelines that will run on every genome produced in Tree of Life, providing a central database of core results available for all.

Systems

We develop and maintain some core systems used in production, including the execution and tracking of all bioinformatics pipelines, and the deployment of third-party web applications for internal use.

The team uses a wide range of technologies, frameworks and programming languages, including Nextflow, Python, Conda, Jira, LSF, Singularity, and Kubernetes. The technology wheel below shows most of their logos. How many can you recognise ? Let us know on the Sanger Tree of Life Twitter account.

Our people

Group lead

Dr Matthieu Muffato

Informatics Infrastructure Team Lead

I lead the Informatics Infrastructure team of the Tree of Life programme, which guides the implementation and delivery of the genome assembly pipelines, and provides support for large-scale genome analyses for the Tree of Life faculty teams.

Core team

Mr Paul Davis

Data Manager

Ene Göktan

Informatics & Digital Associate

Dr Cibele Sotero-Caio

Genomic Data Curator - Tree of Life Genomics

Previous core team members

The following were also members of this team:

Bethan Yates
Zaynab Butt

Associated research

Collaborations

Collaboration

25 Genomes for 25 Years

The project's primary goal was to sequence 25 novel genomes representing UK biodiversity, as part of the Wellcome Sanger Institute' ...

Collaboration

Aquatic Symbiosis Genomics Project

An ambitious project to read the genomes of 1,000 freshwater and marine species that represent more than 500 symbiotic relationships, ...

Collaboration

BIOSCAN

The BIOSCAN project is studying the genetic diversity of 1,000,000 flying insects from across the UK over a five ...

Collaboration

Darwin Tree of Life Project

The Darwin Tree of Life Project is a collaboration by scientific partners to produce high-quality reference genomes for all known species ...

Collaboration

The ANOSPP Project

The ANOSPP Project aims to inform and improve vector control by improving understanding of the diversity of mosquito species that transmit ...

Collaboration

Tree of Sex Initiative

Tree of Sex is a large initiative to gather reproductive data for all Eukaryotic species across the tree of life.

Collaboration

Vertebrate Genomes Project

The Vertebrate Genomes Project (VGP) at the Sanger Institute aims to provide reference quality assemblies for hundreds of fish, rodents and ...

Tools & software

Tool

BlobToolKit

BlobToolKit is a software suite to aid researchers in identifying and isolating non-target data in draft and publicly available genome assemblies.

Tool

Genome on a Tree (GoaT)

GoaT is a powerful data aggregator and portal to explore and report underlying data for the eukaryotic tree of life. It ...

Tool

Tree of Life pipeline suite

Workflows and tools to investigate the genomic diversity of complex organisms.

Data

Data set

Genome Notes - Darwin Tree of Life

Genome Notes are the DNA sequences of the reference genomes of the 70,000 UK species of Britain and Ireland as ...

Data set

Vertebrate Genomes Sequencing

The Sanger Institute is developing a major programme in biological diversity genome sequencing across the tree of life. One of the ...

Related groups

Science group

Blaxter Group

Evolutionary Genomics

All life is linked by the common thread of DNA, modified through evolution. We use whole genome sequences to explore the ...

Science group

Cellular Genomics Informatics

Cellular Genomics

Our team provides efficient access to cutting-edge analysis methods, environments and pipelines for Cellular Genetics programme, which leads and is involved ...

Science group

Genome Reference Informatics Team

Tree of Life Programme

The Genome Reference Informatics Team analyses genome assemblies to reveal and correct quality issues and to identify and add variation. It ...

Science group

Human Genetics Informatics (HGI)

Human Genetics

Human Genetics Informatics (HGI) supports the scientific aims of the Human Genetics programme by developing and operating computational analysis workflows, managing ...

Science group

Informatics Support Group

High Performance Computing

We deliver the at-scale computational platforms that enable the Sanger Institute’s scientists to deliver genomic research that others are unable ...

Science group

Informatics and Digital Solutions

Scientific Computing

We support the Sanger Institute’s mission to deliver innovative and ambitious genomics research at a scale to improve human health ...

Science group

Jaron Group

Evolution of unusual reproductive mechanisms

We explore how evolution is driven by changes in DNA sequence and chromosome structure by analysing and comparing the genomes of ...

Science group

Lawniczak Group

Evolutionary genetics

Our research group uses genomics to investigate insect biodiversity and malaria transmission.

Science group

Meier Group

Genomics of rapid speciation and adaptation

Biodiversity is very unevenly distributed across the tree of life. While the evolution of new species typically takes millions of years, ...

Science group

New Pipeline Group (NPG)

Sequencing Informatics

NPG is responsible for the delivery of DNA Pipelines's data products and the provision of informatics expertise and QC systems.

Science group

Parasites and Microbes Informatics

Parasites and Microbes

The Parasites and Microbes Informatics team develops and maintains software applications and systems to support the research activities of the Parasites ...

Science group

Production Genomics

Tree of Life Programme

From sample to assembly and everything around it

Science group

Production Software Development

Laboratory Information and Management Systems (LIMS) compute and infrastructure

We create and support the software and systems the Sanger Institute’s scientists rely at every step of their experiments. We ...

Science group

Sebé-Pedrós group

Comparative regulatory genomics

We are interested in the diversity, regulation, and evolution of cell types across the tree of life.

Science group

Teeling Group

Mammalian phylogenetics and comparative genomics

We seek to uncover the genetic signatures of survival that enable species to adapt to an ever-changing environment. To achieve this, ...

Science group

Tree of Life Assembly

Tree of Life Programme

The Tree of Life assembly team develops and maintains QC, assembly and analysis pipelines as part of the Tree of Life ...

Science group

Tree of Life Core Laboratory Team

Tree of Life Programme

The Core Laboratory team provides molecular biology support for all projects within the Tree of Life Programme.

Science group

Tree of Life Delivery and Operations

Tree of Life Programme

Programme management, operations management and support team for the Tree of Life Programme

Science group

Tree of Life Enabling Platforms

Tree of Life Programme

Our team is responsible for writing and maintaining the core software used in the Tree of Life programme. We work closely ...

Science group

Tree of Life Sample Management

Tree of Life Programme

Our team oversees the supply of samples – collected from organisms in the field or from the laboratory – that feed into the ...

Wellcome Sanger Institute

Programmes and Facilities

Programme

Tree of Life

We generate and use high-quality genome sequences to explore the evolution of life, provide the raw materials for new biotechnology and ...

Careers and Study

Policies

Archive

Leadership

Faculty

Tree of Life Informatics Infrastructure

Data management

Bioinformatics

Systems

Our people

Group lead

Dr Matthieu Muffato

Core team

Mr Paul Davis

Ene Göktan

Dr Cibele Sotero-Caio

Previous core team members

Associated research

25 Genomes for 25 Years

Aquatic Symbiosis Genomics Project

BIOSCAN

Darwin Tree of Life Project

The ANOSPP Project

Tree of Sex Initiative

Vertebrate Genomes Project

BlobToolKit

Genome on a Tree (GoaT)

Tree of Life pipeline suite

Genome Notes - Darwin Tree of Life

Vertebrate Genomes Sequencing

Related groups

Blaxter Group

Cellular Genomics Informatics

Genome Reference Informatics Team

Human Genetics Informatics (HGI)

Informatics Support Group

Informatics and Digital Solutions

Jaron Group

Lawniczak Group

Meier Group

New Pipeline Group (NPG)

Parasites and Microbes Informatics

Production Genomics

Production Software Development

Sebé-Pedrós group

Teeling Group

Tree of Life Assembly

Tree of Life Core Laboratory Team

Tree of Life Delivery and Operations

Tree of Life Enabling Platforms

Tree of Life Sample Management

Programmes and Facilities

Tree of Life