MitoHiFi
A python pipeline for mitochondrial genome assembly from PacBio high fidelity reads, developed within the Darwin Tree of Life Project.
The full paper introducing MitoHiFi, published in BMC Bioinformatics, can be found here.
At time of publication, MitoHiFi had been used to assemble 374 mitochondrial genomes (368 Metazoa and 6 Fungi species) for the Darwin Tree of Life Project, the Vertebrate Genomes Project and the Aquatic Symbiosis Genome Project. Additionally, the inspection of 60 mitochondrial genomes assembled with MitoHiFi for species that already have reference sequences in public databases showed the widespread presence of previously unreported repeats.
Background
PacBio high fidelity (HiFi) sequencing reads are both long (15–20 kb) and highly accurate (> Q20). Because of these properties, they have revolutionised genome assembly leading to more accurate and contiguous genomes. In eukaryotes the mitochondrial genome is sequenced alongside the nuclear genome often at very high coverage. However, a dedicated tool for mitochondrial genome assembly using HiFi reads was, until recently, missing.
About MitoHiFi
MitoHiFi was developed within the Darwin Tree of Life Project – an affiliated project of the Earth BioGenome Project – to assemble mitochondrial genomes from the HiFi reads generated for target species. This project ultimately aims to sequence all eukaryotic species on the archipelago of Britain and Ireland.
The input for MitoHiFi is either the raw reads or the assembled contigs, and the tool outputs a mitochondrial genome sequence fasta file along with annotation of protein and RNA genes. Variants arising from heteroplasmy (the presence of more than one organelle type within a cell, e.g. as in plants) are assembled independently using different tools. Nuclear insertions of mitochondrial sequences are identified and not used in organellar genome assembly.
MitoHiFi is written in python and is freely available on GitHub. MitoHiFi is available with its dependencies as a Docker container on GitHub (ghcr.io/marcelauliano/mitohifi:master).
Sanger Institute Contributors
Professor Mark Blaxter
Programme Lead for Tree of Life Programme and Senior Group Leader
Dr Richard Durbin
Associate Faculty
Ksenia Krasheninnikova
Senior Bioinformatician
James Torrance
Senior Bioinformatician
Dr Marcela Uliano-Silva
Senior Bioinformatician