Prediction of function shift in protein families

Detta är en avhandling från Stockholm : Karolinska Institutet, Department of Cell and Molecular Biology

Sammanfattning: With the availability of a large number of complete genome sequences, it has become essential to annotate the protein sequences derived from them as precisely as possible. Even though presently available computational methods can predict broad functionality for most protein sequences, there is room for improvement in order to get more precise functional annotation. Analysis of functional conservation and divergence in protein families can improve the quality of annotation for available genome sequences. Such an analysis adds an extra level of usefulness to protein families as it would predict which subgroups in a family share identical functions and which groups are likely to have diverged in function. Many genes of pharmacological interest occur in large families for which understanding the specific function is important. This thesis describes large-scale analysis of functional shifts in protein families. Initially, we created a large dataset of protein families and subfamilies with known functional differences and assessed how well function shifts can be predicted by using existing methods for identifying subfamily specific functional residues. We showed that these methods can discriminate between same function and different function subfamilies and achieved a prediction accuracy of 71%. This approach predicted many previously unknown cases of function divergence (Paper I). A new measure was introduced for predicting function shift, which is representative of all positions in the alignment and by combining it with previously proposed measures, we achieved further improvement of function shift prediction (Paper II). A web resource was developed, available freely to the public for disseminating subfamily classification and function shift analysis of protein families (Paper Ill). We analyzed multi-species ortholog groups for functional shifts using the methods proposed and predicted many new cases of functional shifts between ortholog and paralog subfamilies (Paper IV). This work demonstrates the power of classifying protein families into subfamilies along with function shift analysis for better annotation of protein sequences emerging from genome sequencing efforts. The methods and resources developed as part of this thesis represent a valuable resource for scientists elucidating detailed functional aspects of proteins, thus helping in evolutionary studies, comparative genomics and better drug designs for Human diseases.

  HÄR KAN DU HÄMTA AVHANDLINGEN I FULLTEXT. (följ länken till nästa sida)