Reliable and robust detection of coevolving protein residues

Protein Eng Des Sel. 2012 Nov;25(11):705-13. doi: 10.1093/protein/gzs081. Epub 2012 Oct 16.

Abstract

Since the cooperative mechanism between interconnected residues plays a critical role in protein functions, the detection of coevolving residues is important for studying various biological functions of proteins. In this work, we developed a new correlated mutation analysis method that shows substantially better prediction accuracy than all other methods. More importantly, the prediction accuracy of our new method is insensitive to the characteristics of the multiple sequence alignments (MSAs) from which the correlated mutation scores are calculated. Thanks to this desirable property, not only it does it show a good performance even for MSAs automatically generated by sequence homology methodologies, which allows us to build a fully automatic easy-to-use server named CMAT, but its performance is also consistently high on the columns of MSAs containing a high fraction of gaps, which greatly extends the applicability of the correlated mutation analysis. The key development of this work is the joint probability estimation that can be greatly improved by utilizing sequence profile as prior knowledge, which is shown to be highly beneficial to the correlated mutation analysis and its applications. From the computational perspective, we made two important findings; the sequence profile can be used to estimate the pseudocounts, and the consistency rule on joint probabilities and marginal probabilities is important for accurately estimating the joint probability. The web server and standalone program are freely available on the web at http://binfolab12.kaist.ac.kr/cmat/.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Databases, Protein
  • Evolution, Molecular*
  • Molecular Sequence Data
  • Mutation
  • Probability
  • Protein Interaction Mapping
  • Protein Multimerization
  • Proteins / chemistry*
  • Proteins / genetics*
  • Proteins / metabolism
  • Sequence Alignment
  • Sequence Analysis, Protein / methods*
  • Software

Substances

  • Proteins