AT08 – MOLGENIS Diagnostic Platform for Clinical Genomics

Application talk

Abstract

Background With costs of next-generation sequencing decreasing rapidly, we expect thousands of patients to soon have whole-genome profiling. Although NGS-based genetic testing is a major improvement over Sanger sequencing (one-gene-at-a-time), its implementation is a huge challenge for diagnostic laboratories. Analysis of the thousands of variants per patient is still largely performed by hand. The current standard of genome diagnostics consists of NGS variant data generation (SNVs, CNVs, indels/SVs, etc.); reduction to known pathogenic genes (region of interest); filtering of known benign and pathogenic variants (using e.g. locus-specific databases, dbSNP, ClinVar, OMIM, HGMD, DECIPHER); elimination of common variants (i.e. minor allele frequency >1% in e.g. 1000 Genomes, ExAC, Genome of the Netherlands); analysis of inheritance patterns and matching to the disease phenotype; assessment of mutation type (e.g. premature stop-codon, frameshift, missense), and estimating variant pathogenicity using software tools. Based on this cumbersome process, each variant is classified by hand on a 1-5 scale. However, most variants are still left with an undecided score (2:unlikely pathogenic, 3:variants of unknown significance, or 4:likely pathogenic). Even when only a small subset of the genome is considered, a long list of variants remains (~6,000 when assessing genes only); this can take months for diagnosticians to evaluate. Approach With the advent of high-throughput next-generation sequencing a large amount of new DNA data has been collected and reported in public databases such as Genome of the Netherlands, ExAC and 1000 Genomes. Other rich sources of genotype-to-phenotype data are the local databases of clinical genetics centers and collection samples in large biobanks, like the LifeLines cohort studies from the BBMRI infrastructure. Meanwhile also many new tools are being published. Unfortunately, these databases and tools are diverse and fragmented. As a consequence, each genetic lab is currently duplicating massive resources on the pre-competitive effort to validate and integrate all these into their diagnostic workflow. To enable large-scale molecular data analysis, diagnostics labs need software to support the interpretation workflow, execute the necessary algorithms, provide real-time access to all the information required, and integrate multiple layers of evidence in an understandable way. While some commercial and academic software exists, we have seen that diagnostics need flexibility to integrate new methods rapidly when they emerge and share this work with other labs which can be addressed using open source software. Result A few years ago we developed a research system for variant data integration and interpretation using the MOLGENIS open source platform and reuseing its import wizard for flexible formats, APIs for R statistics, Python, REST and JSON, user and rights management, cross-dataset ontological harmonization, and data exploration tools including plotting, filtering, aggregation, complex queries, genome-browsing and metadata inspection. Imported data is indexed using ElasticSearch to eliminate long loading times. To our surprise, these tools were immediately incorporated into production diagnostics, to complement their already sizable set of commercial tools. We therefore decided to continue and develop into a flexible system that enables bioinformaticians to rapidly integrate and share new methods and data into user-friendly ‘apps’ with a capacity for continuous re-analysis, so that many ‘unsolved’ patients can be re-diagnosed as better methods emerge. MOLGENIS/genomics platform now enables upload and filtering of VCF files and associated genotypes and is used as part of routine diagnostics at UMC Groningen and has piloted data sharing between Dutch diagnostics labs. High-throughput use cases such as multi-omics integration and NGS variant interpretation can benefit from web-based MOLGENIS upload formats and query performance, but also require a pre-filled toolbox to help process and understand these genetic variants. Therefore we have added extensible variant ‘annotators’ that enables easy data enrichment (CADD, 1000G, ExAC, GoNL, SnpEff, ClinVar, CGD, HPO, etc), application analysis protocols (automated variant interpretation, gene function prediction, risk prediction, etc) and supporting algorithms (discover de-novo variants, symptom-to-disease matching, genome build liftover, etc). These tools are also available as a command-line executable to use in routine analysis pipelines before uploading the results. Example applications * Fast diagnosis of newborns with unclear metabolic disorders. Quality of life and survival chances of newborn infants with a suspected severe genetic defect may be greatly increased with a fast diagnosis. At the UMCG, these children are are whole-genome sequenced to identify causal pathogenic variants. After wet-lab sample processing and variant calling, MOLGENIS/genomics is used to quickly enrich genomics variants with population allele frequencies, known disease genes and pathogenicity estimates, before final filtering and variant interpretation. Using this workflow, a diagnosis can be established within days and to a relatively high degree of success (around 30% yield). * VKGL data sharing. Although data sharing is increasingly encouraged by the scientific community, there is relatively little guidance for diagnostics laboratories on sharing genomic data in practice. To this end, the Dutch Society for Clinical Genetic Laboratory Diagnostics (VKGL) has been evaluating MOLGENIS as a variant sharing solution. MOLGENIS is able to import aggregated VCF data from various medical centers and represent the combined set of variants in a single view that allows clinicians to quickly check whether a certain variant has been seen and interpreted in another lab. This system would enable more powerful filtering of population variants before interpretation, as well as a decrease in interpretation time when variants have received a previous verdict from a trusted colleague. Conclusion & Future work The MOLGENIS community is committed to continue to develop valuable diagnostic and personalized genomics medicine apps as a sharing platform for best practice data and pipelines well-curated reference knowledge-bases, and optimal user interfaces. In particular, we plan to add complete variant calling pipeline and an automated system for re-analysis of patients using the continuously improving methods and data. We envision professional software providers to take up the hosting and support of this system (software as a service) and aim for a self-service level of maturity. All software code and documentation is available for free as open source at www.molgenis.org/genomics.

Authors

Joeri van der Velde, University Medical Center Groningen, Netherlands
Bart Charbon, University Medical Center Groningen, Netherlands
Dennis Hendriksen, University Medical Center Groningen, Netherlands
Mark de Haan, University Medical Center Groningen, Netherlands
Cisca Wijmenga, University Medical Center Groningen, Netherlands
Tom de Koning, University Medical Center Groningen, Netherlands
Rolf Sijmons, University Medical Center Groningen, Netherlands
Richard Sinke, University Medical Center Groningen, Netherlands
Morris Swertz, University Medical Center Groningen, Netherlands