Proceeding talk – Theme: Genome.
Abstract
We have developed an approach to condense multiple annotated genome sequences into a single representation. As in other approaches, the core of our pan-genome is a compressed de Bruijn graph. What differentiates our method is that we construct the pan-genome in a Neo4j graph database which scales to arbitrary graph sizes, allowing for the analysis of large collections of complex eukaryotic genomes. Our pan-genome graph is created using an online algorithm that has a run time linear in the total sequence length. Besides construction, we provide useful functionalities for annotating pan-genomes, grouping genes, retrieving sequences, and comparing pan-genomes. The pan-genome is stored on disk and new genomes or annotated features can be added. We have implemented a stand-alone command-line Java application, called PanTools, for the representation, storage and exploration of pan-genomic data.
Authors
Siavash Sheikhizadeh Anari, Wageningen University, Netherlands
Eric Schranz, Wageningen University, Netherlands
Mehmet Akdel, Wageningen University, Netherlands
Dick de Ridder, Wageningen University, Netherlands
Sandra Smit, Wageningen University, Netherlands
