PT10 – PanTools: representation, storage, and exploration of pan-genomic data

Proceeding talk – Theme: Genome.

Abstract

We have developed an approach to condense multiple annotated genome sequences into a single representation. As in other approaches, the core of our pan-genome is a compressed de Bruijn graph. What differentiates our method is that we construct the pan-genome in a Neo4j graph database which scales to arbitrary graph sizes, allowing for the analysis of large collections of complex eukaryotic genomes. Our pan-genome graph is created using an online algorithm that has a run time linear in the total sequence length. Besides construction, we provide useful functionalities for annotating pan-genomes, grouping genes, retrieving sequences, and comparing pan-genomes. The pan-genome is stored on disk and new genomes or annotated features can be added. We have implemented a stand-alone command-line Java application, called PanTools, for the representation, storage and exploration of pan-genomic data.

Link to PDF file

Authors

Siavash Sheikhizadeh Anari, Wageningen University, Netherlands
Eric Schranz, Wageningen University, Netherlands
Mehmet Akdel, Wageningen University, Netherlands
Dick de Ridder, Wageningen University, Netherlands
Sandra Smit, Wageningen University, Netherlands