ET07 – The Genomic HyperBrowser

ELIXIR talk – session: ELIXIR – Tools for data analysis.

Abstract

The Genomic HyperBrowser represents an interdisciplinary effort to tackle the challenges in statistical analysis of genome-scale data. It is an open-ended web server, developed with the aim of supporting nearly any statistical analysis on datasets of information anchored to genomic locations (genomic tracks). It can be used in a broad range of biological investigations, including the analysis of genomic variation, gene regulation, chromatin structure and 3D genome organization. Several published methodologies for the analysis of genomic tracks have been integrated into a single web-based system: the initial implementation of the Genomic HyperBrowser focused on the analysis of single, or pairs of genomic tracks, the Disease Regulome added functionality for generating large-scale maps of relations between sets of tracks, HiBrowse added capabilities for analyzing the 3D organization of the genome, ClusTrack added support for highly customizable clustering of genomic tracks, and GSuite HyperBrowser, currently available in beta version, permits statistical analyses across collections of genomic tracks. The initial Genomic HyperBrowser was the result of an intense collaboration between experts in biology, statistics, computer science and bioinformatics in Norway. The goal has been to offer an open-ended system able to handle close to any statistical analysis on individual genomic tracks, or on relations between pairs of genomic tracks. A base collection of tracks has been made available in a local repository and can be complemented by tracks provided by the researcher. The core set of statistical analyses is provided through a single tool, “Analyze genomic tracks”, offering a total of 56 descriptive statistics and 20 hypothesis tests. This core analysis functionality is augmented by a collection of data preparation tools, permitting sophisticated biological questions to be examined in an intuitive manner. An example usage is to ask whether exon boundaries tend to be associated with abrupt changes in DNA melting propensity. The Disease Regulome represented a first resource providing a global overview of the complex relations between transcription factors and disease. More generally, it contributes functionality for generating large-scale maps of track relations. At the core is a novel methodology for performing pairwise, genome-wide comparisons of every combination of genomic tracks from two distinct sets. Several interactive maps of relations between genomic data sets are provided, for example a main map showing how binding sites for different transcription factors co-localize with genes associated to different diseases. HiBrowse permits visual and statistical analysis of the spatial interaction and co-localization of genomic features. It builds on the core implementation of the Genomic HyperBrowser, extending the core functionality and data representation to encompass graph representation of chromosome conformation capture data (e.g. Hi-C and ChIA-PET). An example usage is to ask whether somatic mutations associated with a particular cancer are spatially co-localized according to chromosome conformation capture data for a relevant cell type. ClusTrack provides a catalog of biologically meaningful feature extraction approaches and similarity measures for using clustering to analyze relations between multiple genomic tracks. An example usage is to cluster different histone modifications based on co-localization of genomic regions associated with these histone modifications in a particular cell type. GSuite HyperBrowser is a comprehensive, end-to-end analysis web system providing a large set of tools for the acquisition, manipulation and statistical analysis of collections of genomic tracks (represented by a simple format GSuite). A GSuite file will typically represent the same genomic/epigenomic feature across a range of cell types or represent several different features in the same experimental setting. The system includes tools for conveniently compiling GSuites from major public repositories (ENCODE, Roadmap Epigenomics, EBI, ICGC and more), as well as from local files. By using GSuite as a means of interoperability between tools, analyses involving up to hundreds of genomic tracks can be conveniently specified through a web interface. As the system is integrated with Galaxy, it allows such large-scale analyses to be easily reproduced in exact or modified form by peers, reviewers, and the scientific community in general. An example usage is to extract a track of SNPs associated with a given disease from the GWAS catalog at EBI, compile a GSuite of DNaseI accessibility across cell types from the ENCODE repository, and then ask for which cell types the disease associated SNPs fall most consistently within DNaseI accessible regions. All the above methodologies are combined in an integrated system, where the formats GTrack (for individual tracks) and GSuite (for collections of tracks) ensures interoperability and synergy between the methodologies. The Norwegian node of Elixir contributes to the development of this integrated system, and ensures a robust delivery of the functionality to the european research community. The system is built on top of the Galaxy platform, and inherits features of the Galaxy system. Most importantly, it enables reproducible analyses where parameters and results are tracked through the Galaxy history. The system is freely available at https://hyperbrowser.uio.no/gsuite

Authors

Boris Simovski, Department of informatics, University of Oslo, Oslo, Norway
Finn Drabløs, Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway
Sveinung Gundersen, Department of informatics, University of Oslo, Oslo, Norway
Morten Johansen, Institute for Medical Informatics, The Norwegian Radium Hospital, Oslo University Hospital, Oslo, Norway
Diana Domanska, Department of informatics, University of Oslo, Oslo, Norway
Abdulrahman Azab, Department of informatics, University of Oslo, Oslo, Norway
Geir Kjetil Sandve, Department of informatics, University of Oslo, Oslo, Norway
Eivind Hovig, Department of Tumour Biology, The Norwegian Radium Hospital, Oslo University Hospital, Oslo, Norway