HT12 – Halvade: scalable sequence analysis with MapReduce

Highlight talk – Theme: Data

Abstract

While DNA and RNA sequencing are not yet part of common clinical practice, they are increasingly adopted in the diagnostics of rare genetic disorders and targeted therapy in oncology. While large genomics centers have access to large computing facilities to process these data, smaller clinical centers and research groups face issues in this regard. We present Halvade, a framework that enables sequencing pipelines to be executed in parallel on a multi-node/multi-core compute infrastructure. A DNA-seq variant calling pipeline has been implemented according to the GATK Best Practices recommendations, supporting whole genome and whole exome sequencing. Using a 15-node computer cluster, Halvade processes the NA12878 dataset in less than three hours, a task that takes 12 days when executed sequentially. Even on a single workstation, Halvade significantly decreases processing time, thus enabling not only the scalability of large sequencing data sets, but also cutting down the time-to-result for a single sample.

Authors

Dries Decap, Ghent University – iMinds, Belgium
Joke Reumers, Janssen Research and Development, Belgium
Charlotte Herzeel, IMEC, Belgium
Pascal Costanza, Intel Corporation Belgium
Jan Fostier, Ghent University – iMinds, Belgium

Source of publication

2015, Bioinformatics, vol. 31 (15), pp. 2482-2488