HIGHLIGHT TALK – THEME: GENOME
ABSTRACT
Interactions of transcription factors (TFs) with DNA comprise a complex interplay between base-specific amino acid contacts and readout of DNA structure. traditionally, position-specific scoring matrices (PSSMs) are used to model TF binding sites (TFBSs). Here, we describe an approach that builds upon PSSMs and integrates DNA shape features derived from our DNAshape prediction method. Results from 400 human ChIP-seq datasets show that incorporating DNA shape features (helix twist, minor groove width, propeller twist, and roll) with PSSM sequence-based scores in a machine learning framework consistently improves the accuracy of TFBS predictions. Improvement is also observed when TF flexible models (TFFMs) and a machine learning-based approach are used in lieu of PSSMs. Incorporating DNA shape information is most beneficial for E2F and MADS-domain TF families. Results from the analysis of MADS-domain TFs highlight the importance of propeller twist in a TFBS position-specific manner.
AUTHORS
Anthony Mathelier, Centre for Molecular Medicine Norway (NCMM), University of Oslo, Norway
Beibei Xin, University of Southern California, USA
Tsu-Pei Chiu, University of Southern California, USA
Lin Yang, University of Southern California, USA
Remo Rohs, University of Southern California, USA
Wyeth Wasserman, Centre for Molecular Medicine and Therapeutics, University of British Columbia, Canada
