ELIXIR talk – session: ELIXIR – Human genomics and translational data.
Abstract
With the advent of high-resolution and high-throughput experimental platforms the field of biomedical research has become more complex, with major shifts in data diversity and dimensions. Consequently, running translational research studies requires solutions for the increasing demand of data processing, workflow management, and data storage. Due to the privacy issues related to the clinical nature of translational research combined with experimental data size and diversity, there is a need for a secure framework to store and analyse such studies. Here we present two ongoing programs applied to human data which seek to extend data accessibility, ensure long-term archival, and facilitate downstream analysis by utilising the European Genome-phenome Archive (EGA). The first is a scoping study for long-term data storage of IMI Oncotrack data, and the second is to integrate EGA into the TraIT – workflows, by referencing data in EGA from tranSMART and allowing upload of data stored in EGA into Galaxy. EGA is a service run jointly by EMBL-EBI and CRG, which provides a service for the long term archiving and distribution of personally identifiable genetic and phenotypic data resulting from biomedical research projects. Data at EGA are collected from individuals whose consent agreements authorise data release only for specific research use to bona fide researchers. Strict protocols govern how information is managed, stored and distributed by the EGA project and each data provider is responsible for ensuring a Data Access Committee is in place to grant access to the data. As a long-term archive, EGA is ideally placed to facilitate data access and management to funded projects after completion to enable continued access to these data. To this end EGA is working with IMI OncoTrack, methods for systematic next generation oncology biomarker development, which is an international consortium of 22 multidisciplinary partners across academia, pharma industry and SMEs co-ordinated by Bayer and the Max Planck Institute for Molecular Genetics. Supported by the Innovative Medicines Initiative, it is one of Europe’s largest collaborative academic-industry research projects which aims to systematically collect different data types on colon cancer and use in silico methods to analyse the “model cancer cell”. The project dataset includes genomic / transcriptomic data, imaging data, data on drug response from animal and cell culture studies, results of proteomic analysis (RPPA, MS and plasma protein analyses and other biomarker information), as well as clinical data from the patients recruited to the study. In addition to providing a long term archiving and data sharing solution for IMI Oncotrack beyond the funded project this scoping study will also: • establish a data access committee and governance process to oversee requests for data use • report the associated resource requirements, costs and outcomes such that the scoping study can form the basis for IMI strategies on data • extend the EGA data model to accurately reflect the meta-data We report that to ensure maximum visibility for the data, the meta-data has been divided into both public and controlled-access meta-data, and the public meta-data mapped to ontologies where possible to maximise discoverability. This process has demonstrated a use-case for a common data-format for identifiable sample meta-data which has to be controlled-access, such as phenotype and location. We describe how we plan to link animal model and Xenograft data to human identifiable data within the EGA, and how this whole process has been undertaken with a view to generalise it with respect to other use-cases. The CTMM-TraIT project has developed an infrastructure for molecular profiling data from translational research; this includes data storage of processed and clinical data in tranSMART and computational workflows accessible via Galaxy that convert raw experimental data into such processed data. Here we developed integration of processed clinical and translational data resources with the large scale archival processes at EGA, and a secure data analysis workflow within a Galaxy framework. Figure 1 demonstrates how the clinical and processed data is managed by tranSMART, which links to the raw data in EGA which in turn can be accessed and analysed from within a Galaxy environment. This allows reproducing the interpreted data, as stored in tranSMART by a Galaxy workflow, starting from the raw experimental data as stored in EGA. In part due to the use of tranSMART in both projects, the IMI Oncotrack and the CTMM-TraIT projects have provided complementary use cases for EGA. The submission process at EGA has been improved to facilitate the large-scale data archiving from TraIT, and improved REST services have been implemented to facilitate the discovery and access of linked data within the EGA. We describe how this can be extended to stream data directly to alternative platforms or locations, such as cloud instances, and the use of a FUSE layer to directly access data without having to decrypt these data prior to use. To allow access to EGA data from within Galaxy, a streaming tool has been developed, which allows secure and encrypted data from EGA to be streamed directly into a secure Galaxy platform. Both these projects have been instrumental in helping drive development at the EGA, especially through feedback received and in an ongoing effort to make EGA data correspond to FAIR (Findable, Accessible, Interoperable, and Reusable) data principles where possible.
Authors
J. Dylan Spalding, EMBL-EBI, United Kingdom
David Henderson, Bayer HealthCare Pharmaceuticals, Germany
Reha Yildirimman, Alacris Theranostics GmbH, Germany
Sanne Abeln, VU University Amsterdam, Netherlands
Susanna Repo, ELIXIR Hub, United Kingdom
Niklas Blomberg, ELIXIR Hub, United Kingdom
Alexander Senf, EMBL-EBI, United Kingdom
Jeff Almeida-King, EMBL-EBI, United Kingdom
Jordi Rambla, CRG, Spain
Audald Lloret I Villas, CRG, Spain
Chao Zhang, VU University Amsterdam, Netherlands
Jochem Bijlard, VU University Amsterdam, Netherlands
Youri Hoogstrate, Erasmus University Medical Center, Netherlands
Remond Fijneman, Netherlands Cancer Institute, Netherlands
Andrew P. Stubbs, Erasmus University Medical Center, Netherlands
Jan-Willem Boiten, Lygature, Netherlands
Gerrit Meijer, Netherlands Cancer Institute, Netherlands
Helen Parkinson, EMBL-EBI, United Kingdom
