Collaboration: Sketching algorithms for fast and memory-efficient long-read genome assembly, Massachusetts Institute of Technology (MIT), USA

Date: 05/14/2021

Location: Massachusetts Institute of Technology (MIT), USA

I’m looking for collaborators interested in long-read (PacBio CLR/HiFi and Oxford Nanopore) sequence analysis and string and graph algorithms for a possible project on accelerating long-read genome assembly with sketching algorithms. The project would be based on our recent work on minimizer-space de Bruijn Graphs (Ekim, Berger, Chikhi, RECOMB 2021 & Cell Systems) which produces highly contiguous PacBio HiFi assemblies faster and more memory-efficiently. The project would be focused on incorporating long Oxford Nanopore reads to polish the assembly graph generated using PacBio HiiFi reads, and potentially extend the assembler to produce phased diploid assemblies. Moreover, other sketching methods to improve contiguity can be discussed.

Ideally, the collaborators would have some familiarity with genome assembly algorithms, but it’s not required. Some coding experience is required; currently the codebase is in Rust which doesn’t have a steep learning curve:

If you’re interested, feel free to get in touch with me at Thanks in advance!

Barış Ekim



RSG-Turkey is a member of The International Society for Computational Biology (ISCB) Student Council (SC) Regional Student Groups (RSG). We are a non-profit community composed of early career researchers interested in computational biology and bioinformatics.


Follow us on social media!