Public Consortium Launches Final Phase of Human Genome Sequencing: 85 percent of Human Genome Sequence Assembled and Available to the Public
The Human Genome Project international consortium today announced the official launch of the final phase of the human genome sequencing project – the effort to decipher the 3 billion DNA letters that make the human body. The milestone marks the transition from the initial phase of generating a ‘working draft’ of the human DNA to the final phase of producing the complete ‘finished’ sequence.
Sixteen genome centres around the world from the United States and Europe to Japan and China will officially begin Phase Two of the Human Genome Project tomorrow, May 9.
Phase One has produced coverage of the vast majority of the human chromosomes in fourteen months. The last remaining DNA from this first phase is already in the centres’ sequencing pipelines and will flow into public databases over the next six weeks.
Phase Two will involve producing a ‘finished’ sequence of the human genome, by filling the gaps in the sequence and by increasing the overall sequence accuracy to 99.99 percent.
Phase One: Working Draft
The goal of the first phase was to create a ‘working draft’ covering 90 percent of the euchromatic portion of the human DNA, by sequencing large ‘clones’, representing segments from the genome. Draft sequence allows scientists to directly identify the vast majority of the human genes, although the sequence itself still contains gaps and uncertainties.
The centres have so far produced and released sequences from overlapping clones containing a total of 3.2 billion DNA letters. Allowing for the overlaps, these segments cover approximately 85 percent of the human genome. (Producing the sequences of these clones actually involved generating more than 16 billion bases of raw DNA sequence information, with a typical clone being ‘covered’ with random sequences to a depth of more than five-fold.)
The remaining clones that will complete the working draft were selected in late April, and are now in process at the sixteen centres. The final ‘working draft’ data are now flowing into public databases at a rate of 10,000 DNA letters per minute, and will all be deposited by mid-June.
The ‘working draft’ is assembled in a two-step fashion. Each clone is first ‘assembled’ from its sequence information. The various clones can then be ‘assembled’ together into a ‘layout’ on the human genome, based on their chromosomal location.
The first comprehensive ‘layout’ of the human genome was constructed in mid-April by scientists in the international consortium. The layout shows the chromosomal positions and the detailed relationship among the more than 20,000 large clones used to sequence the genome; it also spotlights the remaining segments to be covered. The clones in the layout also have immense value beyond their immediate role as an aid in sequencing: they provide a permanent resource for human genetics, because they can be used for direct biological studies of gene function.
“It’s breathtaking to see the DNA sequences arrayed along the human chromosomes, from one end to the other. The individual contributions have fallen together to yield a global picture. We can now turn to plugging the remaining holes.”
Dr. Robert Waterston Director of the Genome Sequencing Center at Washington University in St. Louis, Missouri
“The progress in human DNA sequencing has been stunning. The early projections have been left in the dust. The result has been an information explosion that is fuelling a revolution in biomedical research.”
Dr. Eric S. Lander Director of the Whitehead Institute Center for Genome Research in Cambridge, Massachusetts
Dr Lander attributed the acceleration in sequencing of the human genome to advances in automation, informatics and organization at the various centres.
The sequence information from the ‘working draft’ has been immediately and freely released to the world, with no restrictions on its use or redistribution. The information is scanned daily by scientists in academia and industry, as well as by commercial database companies providing information services to biotechnologists. Already, many tens of thousands of genes have been identified from the genome sequence. Moreover, the location of the genes in the genome is pinpointed to high resolution by the working draft sequence because each portion of the working draft is derived from a clone of known location, corresponding to one twenty-thousandth (1/20,000) of the genome.
For example, the ‘working draft’ has allowed human geneticists to find genes responsible for dozens of inherited diseases including breast cancer, hereditary deafness, stroke, epilepsy, diabetes and various skeletal disorders.
The draft sequence is also being used as a resource by the SNP Consortium, industry-academia collaboration, to identify sites of DNA sequence variation in the human population. The Consortium has identified more than 150,000 such sites of variation called single nucleotide variations (SNPs). These SNPs provide a powerful tool for studies of human disease and human history, and they are also being released into the public domain.
Finally, the draft sequence has propelled many basic biological studies. For example, researchers have recently used it to discover the molecular basis of taste, one of the five human senses.
Phase Two: Finishing
The goal of Phase Two is to produce a ‘finished’ sequence of the human genome, by filling the gaps in the sequence and by increasing the overall sequence accuracy to 99.99 percent. (The working draft attains this level of accuracy at more than 90 percent of its DNA bases, but has somewhat greater uncertainty at the remainder of its positions.)
The process involves two activities: (1) performing additional sequencing from the clones used in Phase One and; (2) selecting and sequencing some additional clones from chromosomal segments not covered in Phase One.
“With the final clones from Phase One in our pipelines, we can now turn our full attention to Phase Two. The Finishing Phase should proceed rapidly, based on all the experience that has been gained over the past year. While the target for completion is officially 2003, the great majority of the work will be accomplished much sooner.”
Dr. John Sulston Director of the Sanger Centre, located near Cambridge, England
Although ‘working draft’ sequence allows for the recognition of genes themselves, the higher accuracy and completeness of ‘finished sequence’ makes it a gold-standard reference that can be readily compared to individual patients’ DNA to identify specific single-letter mutations causing hereditary diseases.
In preparation for Phase Two, the international consortium has developed high-throughput methods for producing high-quality ‘finished’ genomic sequence. In the process, approximately 20 percent of the human genome (600 million bases) has been finished to the high standard of 99.99 percent accuracy and completeness. The finished sequence of human chromosome 22 was published in December 1999.
In a separate announcement today, scientists from the international consortium are announcing the publication in Nature of a paper reporting the finished sequence of human chromosome 21, the chromosome involved in Down’s syndrome. This work was led by scientists from Japan and Germany.
“The goal of Human Genome Sequencing is to provide a solid foundation for the next century of biomedical research. We won’t stop until every uncertainty that can be resolved is resolved.”
Dr. Richard Gibbs Director of the Baylor College of Medicine Sequencing Center
The international consortium also re-affirmed today its commitment to immediate release of the Phase Two information into the public domain.
International Genome Summit Meeting The launch of Phase Two coincides with an international summit of leaders from the sixteen genome centres, which will occur on Wednesday, May 10 at the Cold Spring Harbor Laboratory on Long Island, New York. Background Sequencing, which is determining the exact order of DNA’s four chemical bases, commonly abbreviated A, T, C and G, has been expedited in the Human Genome Project by technological advances in deciphering DNA and by the collaborative nature of the effort, which includes about 1,000 scientists worldwide working together effectively.
The Human Genome Sequencing Project aims to determine the sequence of the euchromatic portion of human genome. The euchromatic portion excludes certain regions consisting of long stretches of highly repetitive DNA that encode little genetic information. Such regions are said to be heterochromatic. (Examples of heterochromatic regions include the centres of chromosomes, called centromeres.)
The international Human Genome Sequencing consortium includes scientists at 16 institutions in France, Germany, Japan, China, Great Britain and the United States. The five largest centres are located at: Baylor College of Medicine, Houston, Texas; Joint Genome Institute in Walnut Creek, CA; Sanger Centre near Cambridge, England; Washington University School of Medicine, St. Louis; and Whitehead Institute, Cambridge, Massachusetts.
The project is funded by grants from government agencies and public charities in the various countries. These include the National Human Genome Research Institute at the US National Institutes of Health, the Wellcome Trust in England, and the US Department of Energy.
The total cost for Phase One (‘working draft’) is approximately $300 million worldwide, with roughly half ($150 million) being funded by the US National Institutes of Health.