Wellcome Sanger Institute
IT
Archived

Information Communications Technology

Archive Page

This page is maintained as a historical record and is no longer being updated.

Overview

Our goal is "To provide World Class High Performance Computing and First Class Production Platforms and Services for genome and biodata research."

In IT we run one of the World’s largest Life Sciences Data Centres. Currently, the Data Centre contains about 64PB of storage capacity and has 38,000 processing cores. We are adding to this at roughly 5PB a year from the analysis performed by the scientists plus 2PB a year or 5TB a day from the sequencers.

The IT infrastructure at the Sanger Institute is one of the most extensive in the life sciences. Every day we serve data to researchers across the globe; every week our web pages provide 80,000 page views from over 50 web domains.

At the turn of the century the Sanger Institute had just finished a big push to produce its share of the Human Genome Project, generating DNA sequence for public release. It was a major scientific endeavour and throughout the project provided significant challenges for the IT infrastructure.

Now, with tremendous sequencing capacity of emerging next-generation technologies, the IT infrastructure continues to grow dramatically and adapt to the Institute’s scientific needs.

The high performance compute facility (supercomputer) runs about two million tasks (programs / jobs) a week to cater for the research programmes and sequencing production pipelines.  It can be thought of as one very large computer which is roughly 20,000 times larger than the average PC.

This is a fantastic tool to perform science. Our aim has always been to make this Big Data facility as easy to use as possible, so that our scientists can quickly extract the information they are looking for.

Discussions with all other large-scale genome sequencing centres are integral to maintaining and improving our IT infrastructure. We must address the particular challenges posed by the explosion of genetic sequence data are working with other centres to investigate international models for future data sharing, such as the Global Alliance for Genomics and Health.

To that end we are tenants in the Jisc Shared Data Centre, a collaborative effort between the Sanger Institute and Jisc, University College London (UCL), Kings College London, Queen Mary University London (QMUL), the Francis Crick Institute and others. We keep a second copy of all of our sequencing data in this facility.

The same facility also hosts eMedLab, a collaborative project for scientific computing in a cloud services environment based on OpenStack. eMedLab is a collaboration between UCL, the Francis Crick Institute, the Sanger Institute, European Bioinformatics Institute (EBI), the London School of Hygiene and Tropical Medicine, QMUL and others, with operational responsibility shared between UCL, Crick and Sanger.

The shape of our IT infrastructure will change dramatically in the future. Large scale collaborative science, and an extremely diverse software landscape, are driving us towards a more cloud-services oriented approach over the next few years, allowing scientists from other organisations to run their own bespoke analyses against our data, and vice versa.

Likewise, the advent of genomics within the clinical space increases our requirements for security, validation and resilience. Meeting these needs while not sacrificing the flexibility required by the Institute’s cutting-edge research science is our key challenge over the next few years.

Related groups