New era of quality and scale in genome sequencing will drive biological discovery
The Vertebrate Genomes Project (VGP) reaches a significant milestone today (29 April 2021) with a number of studies focused on genome assembly quality and standardization for the field of genomics. In a special issue of Nature, with companion papers simultaneously published in other scientific journals, the flagship study includes 16 diploid high-quality vertebrate reference genome assemblies, representing all the top level groups of animals with backbones. This includes the Canada lynx, platypus, greater horseshoe bat, zig-zag eel and Anna’s hummingbird.
The genomes published here, as well as those sequenced in future, will be a valuable resource for research into how vertebrates have evolved, their biology, biodiversity, as well as health and disease.
It is only in the last five years that it has become possible to create high-quality reference genomes without considerable time, effort and expense, with the quality of assemblies prior to this point limited by technological barriers. But over the last few years, the VGP has taken advantage of dramatic improvements in sequencing technologies to begin production of high-quality reference genomes for the approximately 70,000 living vertebrate species.
“It truly was a challenge to design a pipeline applicable to highly diverged genomes. Our largest genome, which was 5Gb in size, broke almost every tool commonly used in assembly processes. The extreme level of heterozygosity or repeat contents posed a big challenge. This is just the beginning; we are continuously improving our pipeline in response to new technology improvements.”
Arang Rhie, first author of the flagship paper from the National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, USA
The VGP’s approach combines assembly pipelines with manual curation to plug major gaps and fix other errors, such as false gene duplications, losses or gains. Previously, manual curation of genome assemblies was deemed to be too time-consuming to be applied to high-throughput projects. But since the beginning of the VGP, the curation process has been refined and speeded up to the point where it is now a key part of high-quality reference genome pipelines.
“Our new approach to produce structurally validated, chromosome-level genome assemblies at scale will be the foundation of ground-breaking insights in comparative and evolutionary genomics. Curation was a niche job at the beginning of VGP, so it’s amazing to see how much it has progressed since then. It’s been quite an odyssey of discovery.”
Dr Kerstin Howe, lead of the VGP curation team at the Wellcome Sanger Institute
The quality of the VGP genome assemblies has enabled new discoveries that have implications for biodiversity and conservation, as well as human health and disease. The first reference genomes of six bat species, for example, revealed selection and loss of immunity-related genes that may underlie bats’ unique tolerance to viral infection. This finding opens up new avenues of research that are particularly relevant for emerging infectious diseases such as COVID-19.
Specific to conservation and in collaboration with the Māori in New Zealand and officials in Mexico, genomic analyses of the kākāpō, a flightless parrot, and the vaquita, a small porpoise and the most endangered marine mammal, respectively, suggest evolutionary and demographic histories of purging harmful mutations in the wild. The implication of these long-term small population sizes at genetic equilibrium gives hope for these species’ survival.
“These studies mark the start of a new era of genome sequencing that will accelerate over the next decade to enable genomic applications across the whole tree of life, changing our scientific interactions with the living world.”
Professor Richard Durbin, of the University of Cambridge and VGP sequencing hub lead at the Wellcome Sanger Institute
As a next step, the VGP will continue to work collaboratively across the globe and with other consortia to complete Phase 1 of the project, approximately one representative species per 260 vertebrate orders separated by a minimum of 50 million years from a common ancestor with other species in Phase 1. The VGP will create comparative genomic resources with these 260 species, including reference-free whole genome alignments that will provide a means to understand the detailed evolutionary history of these species and create consistent gene annotations. Genome data are primarily generated at three sequencing hubs that have invested in the mission of the VGP: Rockefeller University’s Vertebrate Genome Lab, New York, USA; the Wellcome Sanger Institute, UK; and the Max Planck Institute, Germany.
Phase 2 will focus on representative species from each vertebrate family and is currently in the progress of sample identification and fundraising. The VGP has an open-door policy and welcomes others to join its efforts, ranging from fundraising and sample collection to generating genome assemblies or including their own genome assemblies that meet the VGP metrics as part of our overall mission.
More information
The VGP involves hundreds of international scientists working together from more than 50 institutions in 12 different countries and is a model of scientific cooperation, extensive infrastructure and collaborative leadership. Additionally, as the first large-scale eukaryotic genomes project to produce reference genome assemblies meeting a specific quality standard, the VGP has become a working model for other large consortia, including the Bat 1K, Pan Human Genome Project, Earth BioGenome Project, Darwin Tree of Life, and European Reference Genome Atlas, among others.
Publication
Rhie, A., McCarthy, S.A., Fedrigo, O. et al. Towards complete and error-free genome assemblies of all vertebrate species. Nature 2021; 592: 737–746. DOI: 10.1038/s41586-021-03451-0
https://doi.org/10.1038/s41586-021-03451-0