Application talk
Abstract
Gene Ontology (GO) enrichment analyses have become a standard to discover information about large lists of genes. They allow to find GO categories over-associated with a set of genes, using the curated associations between GO terms and genes. Here, we propose a new kind of enrichment analyses, applied to terms from an anatomical ontology (Uberon ontology), mapped to genes by expression patterns. This approach is implemented in the tool “TopAnat”. TopAnat is available as a webtool (http://bgee.org/?page=top_anat), and as a Bioconductor R package (https://bioconductor.org/packages/release/bioc/html/BgeeDB.html). TopAnat allows to discover in which organs genes from a set are preferentially expressed. Note that this is not to be confused with a differential gene expression analysis, where gene expression levels are compared between two conditions, to detect changes in expression. Rather, TopAnat retrieves the anatomical structures where genes are expressed, and for each anatomical structure, tests whether genes from the list of interest are over-associated with this structure, as compared to a background list of genes. These analyses are similar to GO enrichment tests, but applied to terms from an anatomical ontology (Uberon ontology), mapped to genes by expression patterns. This provides a way of studying a new type of property of gene sets, regarding their expression domains. TopAnat is both highly sensitive for detecting organs where genes have an expression biais, and specific to provide the most relevant and precise terms. TopAnat can be readily used in downstream analyses of any studies identifying many gene candidates as a result, as, e.g., differential expression analyses, or Genome-Wide Association Studies. For instance, we used TopAnat to analyze the expression domains of genes associated with autistic end epileptic disorders in human, from Jabbari and Nürnberg, 2016; TopAnat succesfully determined that these genes were preferentially expressed in some specific brain regions, likely to be associated with these disorders (see http://bgee.org/?page=top_anat#/result/8fce889da7b4519c5792573ed3933032c8122819/). This approach is possible thanks to the integration of gene expression data into the Bgee database (http://bgee.org/). Bgee is a unique resource which allows to retrieve and to accurately compare gene expression patterns in multiple animal species. Its database integrates all sources of expression data, from the anatomical detail of in situ hybridization, to the genome coverage of RNA-seq. It provides a reference dataset of wild-type and healthy, as well as high quality and comparable, gene expression data in animals. To this aim, we perform stringent quality controls, and annotate and re-analyze all RNA-seq, microarray and EST data integrated in Bgee. The database currently includes 17 animal species. Bgee thus provides a meaningful overview of the expression pattern of genes, in all species integrated. We precisely re-annotate all RNA-seq, microarray and EST data integrated in Bgee to a representation of the anatomy and the development of the species. We then perform statistical tests to determine: i) where genes are actively expressed over background transcriptional noise; and ii) what are the conditions in which their expression levels are significantly higher than in other conditions. This allows Bgee to capture the spatio-temporal localizations of gene expression. This information is captured as present/absent expression calls, and differential over-/under-expression calls, for a range of anatomical entities and developmental or aging stages, in multiple animal species. Annotated gene expression patterns expressed as present/absent are surprisingly powerful to detect biologically relevant information. TopAnat leverages two strengths of Bgee: i) our work of re-annotation of expression data to anatomical and developmental ontologies; and ii) our statistical analyses, allowing to determine where genes are actively expressed over background transcriptional noise, to transform quantitative expression data into present/absent calls. Our annotations and generation of present/absent calls allow to map genes to the organs they are expressed in. Moreover, all types of data contribute such information. Thus for example a TopAnat analysis on all mouse genes annotated to the GO term “neurological systems”, when considering only RNA-seq data, recovers as top three hits Ammons horn, cerebral cortex and cerebellum; with microarray only, entorhinal cortex, perirhinal cortex and anterior amygdaloid area; with in situ hybridization only, ganglionic eminence, olfactory epithelium and hypothalamus; and with EST data only, nasal cavity epithelium, organ segment and head organ. Because only healthy wild-type gene expression is annotated in Bgee, the enrichment patterns can be readily interpreted. Another interesting feature of TopAnat is that it is capable of detecting the most specific terms among the significant results, to remove the redundancies generated from the graph structure (e.g., retrieving both ‘cerebellum’ and ‘brain’ as significant results). This is thanks to the use of the R package topGO, which allows to account for the topology of the ontology used, and which provides several decorrelation method to remove such redundancies. This allows to improve greatly the visualization of the results, which can be difficult to analyze when using a large ontology, such as the Uberon anatomical ontology. TopAnat shows how the combination of a complex ontology (Uberon) and careful annotation of expression data, both quantitative (RNA-seq, microarrays, ESTs) and qualitative (in situ hybridizations), can be made useful to the biological community. We hope that TopAnat will prove to be as useful as standard GO enrichment analyses, and to be as widely used in a large range of applications, pertinent to fundamental research in academia, as well as to applied research in industry.
Authors
Frederic B. Bastian, University of Lausanne, SIB Swiss Institute of Bioinformatics, Switzerland
Julien Roux, University of Lausanne, Switzerland
Mathieu Seppey, University of Lausanne, Switzerland
Komal Sanjeev, University of Lausanne, Switzerland
Valentine Rech de Laval, Swiss Institute of Bioinformatics (SIB), Université de Lausanne (UNIL), Université de Genève (UNIGE), Switzerland
Philippe Moret, University of Lausanne, Switzerland
Panu Artimo, SIB Swiss Institute of Bioinformatics, Switzerland
Séverine Duvaud, SIB Swiss Institute of Bioinformatics, Switzerland
Vassilios Ioannidis, SIB Swiss Institute of Bioinformatics, Switzerland
Heinz Stockinger, SIB Swiss Institute of Bioinformatics, Switzerland
Marc Robinson-Rechavi, Universite de Lausanne, Switzerland
