Proceeding talk – Theme: Genome.
Abstract
A fundamental challenge in performing sequence assembly using a read-overlap graph is that the true sequence corresponds to a Hamiltonian path on the graph, and under most formulations the assembly problem is NP-hard. This restricts most practical approaches to heuristics. In this work, we avoid this seemingly fundamental barrier by first setting the computational complexity issue aside, and seeking an algorithm that targets information limits. This shift in paradigm allows us to derive a new assembly algorithm – called the Not-So-Greedy algorithm – which can efficiently solve the assembly problem whenever the reads contain enough information for perfect assembly. In practice, these theoretical insights lead to a new algorithm for the construction of a sparse read-overlap graph, which is shown to outperform the standard string graph approach.
Authors
Ilan Shomorony, United States UC Berkeley, United States
Samuel Kim, UC Berkeley, United States
Thomas Courtade, UC Berkeley, United States
David Tse, Stanford University, United States
