......................................................................

NJIT Mathematical Biology Seminar

Tuesday, October 6, 2009, 2:30pm
Cullimore Hall 611
New Jersey Institute of Technology

......................................................................


Two machine learning algorithms for selecting and ranking the best predicted protein structures

Hani Z. Girgis

Department of Computer Science, Johns Hopkins University


Abstract

To predict the three dimensional structure of proteins, many computational meth- ods sample the conformational space, generating a large number of candidate struc- tures. Subsequently, such methods rank the generated structures using a variety of model quality assessment programs in order to obtain a small set of structures that are most likely to resemble the unknown experimentally determined structure. Model qual- ity assessment programs suffer from two main limitations: (i) the rank-one structure is not always the best predicted structure; in other words, the best predicted structure could be ranked as the 10th structure (ii) no single assessment method can correctly rank the predicted structures for all target proteins. However, because often at least some of the methods achieve a good ranking, a model quality assessment method that is based on a consensus of a number of model quality assessment methods is likely to perform better. We realize the advantages of consensus based methods in our algorithms: Zico and ZicoSTP. Our algorithms are based on a consensus of five model quality assessment programs. What distinguishes our algorithms from the traditional machine-learning based model quality assessment programs is their hierarchical nature. The algorithms eliminate the low quality structures at two stages, and then rank a subset of high quality structures. A novel aspect of ZicoSTP is its ability to build an on-line .custom- trained. hierarchy of general linear models. By .custom-trained., we mean for each target protein the algorithm trains a unique hierarchical model on data related to the input target protein. To evaluate our methods we participated in CASP8 as human predictors. Based on the official results of CASP8, ZicoSTP and Zico outperformed the best performing server by 6% and 3% respectively. Our computational methods won the fourth and the sixth positions among the human predictors. Our CASP8 results are purely based on computational methods without any human intervention.




Last Modified: Nov 28, 2007
Horacio G. Rotstein
h o r a c i o @ n j i t . e d u
Last modified: Thu Oct 1 12:01:41 EDT 2009