Alumni
This person is a member of Sanger Institute Alumni.
I design algorithms and data structures, and implement them software tools and libraries based on them. My work is centered around the Burrows-Wheeler transform (BWT), which can be used for compressing, indexing, and analyzing sequence data. I am currently working on the following topics:
- Indexing graphs. BWT-based text indexes are widely used with sequences. Instead of having a single reference sequence, we can make the reference a graph by adding genetic variation to it. We can generalize BWT-based text indexes for graphs, but then we have to rethink most algorithms using them. The resulting index can also be used to speed up de novo genome assembly.
- Massive datasets. We are all drowning in sequence data. There are many BWT-based methods for analyzing it, but most of them are not suitable for datasets larger than a few gigabytes. Scaling the methods up to terabytes represents significant challenges.
- Relative data structures. When two datasets are similar, the data structures we build for them are often similar as well. We can compress the data structures for individual genomes relative to the reference genome, while still being able to simulate the individual structures efficiently.