Proceeding talk – Theme: Proteins.
Abstract
Motivation: Protein intrinsically disordered regions (IDRs) play an important role in many biological processes. Two key properties of IDRs are (i) the occurrence is proteome-wide and (ii) the ratio of disordered residues is about 6%, which prevents accurate prediction of IDRs. Most IDR prediction methods use sequence profile to improve accuracy, which prevents its application to proteome-wide prediction since it is time-consuming to generate sequence profiles. On the other hand, the methods without using sequence profile fare much worse than using sequence profile. Method: This paper formulates IDR prediction as a sequence labelling problem and employs a new machine learning method called Deep Convolutional Neural Fields (DeepCNF) to solve it. To deal with highly imbalanced order/disorder ratio, instead of training DeepCNF by widely-used maximum-likelihood, we develop a novel approach to train it by maximizing AUC (Area Under the ROC Curve), which is an unbiased measure for class-imbalanced data. Availability: http://raptorx2.uchicago.edu/StructurePropertyPred/predict/.
Authors
Sheng Wang, Department of Human Genetics, University of Chicago, United States
Jianzhu Ma, Toyota Technological Institute at Chicago, United States
Jinbo Xu, Toyota Technological Institute at Chicago, United States
