Files
Abstract
Protein fold classification is essential to recognition of protein tertiary structure. It is of particular interest to the structure analyses of proteins of low sequence identity with respect to proteins of known structures. We investigated the protein fold recognition problem with the Committee Support Vector Machine (CSVM) that proved efficient and effective in feature parameterization of background characteristics on a high dimensional space. We were able to combine the physically and chemically analyzed data with computationally generated data through CSVM and applied the method to all-versus-all multi-classifications. Our results in classifications are more accurate than those achievable by other methods, and consistent with the SCOP database. Our fold recognition performance is improved more than 9% over non-committee Support Vector Machine methods. In addition, cores (secondary structures) are investigated as to examine their interactions affecting the tertiary structures. It is shown that core interaction may improve our fold recognition results and be applied for the template-based tertiary structure prediction.