Highlight talk – Theme: Data
Abstract
While DNA and RNA sequencing are not yet part of common clinical practice, they are increasingly adopted in the diagnostics of rare genetic disorders and targeted therapy in oncology. While large genomics centers have access to large computing facilities to process these data, smaller clinical centers and research groups face issues in this regard. We present Halvade, a framework that enables sequencing pipelines to be executed in parallel on a multi-node/multi-core compute infrastructure. A DNA-seq variant calling pipeline has been implemented according to the GATK Best Practices recommendations, supporting whole genome and whole exome sequencing. Using a 15-node computer cluster, Halvade processes the NA12878 dataset in less than three hours, a task that takes 12 days when executed sequentially. Even on a single workstation, Halvade significantly decreases processing time, thus enabling not only the scalability of large sequencing data sets, but also cutting down the time-to-result for a single sample.
Authors
Dries Decap, Ghent University – iMinds, Belgium
Joke Reumers, Janssen Research and Development, Belgium
Charlotte Herzeel, IMEC, Belgium
Pascal Costanza, Intel Corporation Belgium
Jan Fostier, Ghent University – iMinds, Belgium
Source of publication
2015, Bioinformatics, vol. 31 (15), pp. 2482-2488
