Presenters
|

|
Jacques van Helden, PhD, is heading the Laboratory of Genome and network Bioinformatics (BiGRe - ULB, http://www.bigre.ulb.ac.be/). The BiGRe laboratory is specialized in the development, evaluation and application of bioinformatics approaches to analyze genome regulation, biomolecular networks, metabolic pathways and mobile genetic elements. Since 1997, Jacques van Helden has been developing a suite of specialized software tools for the analysis of genome regulation, the Regulatory Sequence Analysis Tools (RSAT, http://rsat.ulb.ac.be/), which integrates a variety of algorithms for motif discovery, sequence scanning for motifs, phylogenetic footprinting, analysis of ChIP-seq data, etc. |
|

|
Stein Aerts, PhD, is heading the Laboratory of Computational Biology at the University of Leuven (http://med.kuleuven.be/bioinformatics). The lab focuses on computational identification of cis-regulatory elements, on mapping transcriptional networks, on "omics" data integration, and on next-generation sequencing (NGS) data analysis. Stein Aerts is the developer of the TOUCAN software for regulatory sequence analysis, of several other motif and module discovery algorithms and text-mining applications, of the ENDEAVOUR application for gene prioritization, and of the cisTargetX method for transcriptional target identification in Drosophila. |
Motivation
The annotation of the non-coding genome with gene regulatory function is lagging far behind the annotation of protein-coding genes and improved annotation will depend both on deeper biological insight into cis-regulatory logic and on more efficient computational prediction algorithms. While the identification of proximal core promoters has improved thanks to better annotation of a gene's 5' end, the determination of distal cis-regulatory modules (CRM) and the mapping of gene regulatory networks remain challenging. State-of-the art methods for CRM and transcription factor binding site prediction use position weight matrices as DNA recognition models, cross-species sequence conservation, clustering of binding sites, and leverage by gene expression data. When sets of similar CRMs are available, models incorporating combinatorial binding can be trained and used for genome-wide predictions of novel CRMs. Recent data obtained by high-throughput experiments accelerate the genome-wide identification of regulatory elements but also provide additional bioinformatics challenges. In the light of these developments, this tutorial will focus on bioinformatics methods to predict cis-regulatory elements and to aid the process of regulatory annotation.
Goals
Participants will be provided with an overview of existing resources (databases, tools) and methods for detecting cis-regulatory elements in genome sequences, and generate testable hypotheses about the binding specificity of transcription factors (motifs discovery), their precise binding locations (binding site prediction), and their target genes (target identification), and go towards regulatory networks. Each session will include a summary of the theory (with slides), illustrated with demos of the exiting databases and analysis tools. Hands-on exercises will allow students to put concepts into practice on selected test cases.
Prerequisites
- a list of software will be provided in advance, in order to allow participants to install it on their laptop
- some knowledge of Unix interface
Outline
- Retrieving regulatory sequences from UCSC, EnsEMBL, RSAT, Toucan, ....
Upstream sequences, introns, repeat masking, multi-genome sequence retrieval, ... Web-server, Web service or command-line access to resources
- From binding sites to binding motifs
- Public databases about transcriptional regulation : motifs, sites and regulatory regions.
JASPAR, PAZAR, ORegAnno, FlyReg
- Regular expressions.
- Position-specific scoring matrices (PSSM).
Concepts: counts, frequencies, pseudo-counts, weights, information content
- PSSM formats and their inter-conversions
- Pattern matching: predicting binding sites
- String-based pattern matching
- Matrix-based pattern matching
- matching scores
- estimation of the risks (P-value distribution)
- Cis-regulatory modules (clusters of motifs)
- Pattern discovery: methods and applications
- String-based pattern discovery
- PSSM-based pattern discovery
- PSSM enrichment detection
- Combinations of PSSMs
- Phylogenetic footprinting
- detecting conserved binding sites by comparative genomics
- Applications
- Clusters of co-expressed genes
- Motif discovery using high-throughput data (ChIP-seq, ...)
- Genome-wide target prediction
- Gene regulatory networks
- Evaluation of prediction performances
- Benchmark datasets
- Strength and pitfalls of evaluation: sensitivity, positive predictive value, accuracy, ROC curves, ...
- Method comparisons
- Browsing genomes for cis-regulatory elements
- Visualization.
- Integration of predictions and annotations.
Practical issues
- Wireless internet will be available at the conference venue
- An indicative reading list will be available a few weeks before the tutorial
- The hands-on will be organized in a cookbook mode: we will distribute protocols that will allow to solve a selection of test cases
- For the hands-on, participants are encouraged to bring a laptop if possible, or to meet up with another participant having a laptop
- the ECCB tutorial will combine Web servers, stand-alone software, and command-line programs
Dedicated tutorial website
http://rsat.bigre.ulb.ac.be/rsat/tutorial_ECCB_2010/
|