Assignment and assessment of orthology and gene function
Sammanfattning: Several genomes from different species have been sequenced over the last years, most notably the human genome. An important task of computational biology is to classify and functionally annotate the large amount of sequence data created by the genome sequencing projects. The concept of orthology and paralogy, developed over 30 years ago by Fitch, plays an important role in this task: Orthologous genes are genes in different species that evolved from a single gene in the last common ancestor of these species. Paralogous genes are genes that evolved due to a duplication event. Orthologs can be seen as different versions of the same gene in different species. Therefore they are likely to have the same functional properties and play a similar biochemical role in the cell. Once an orthologous gene for a newly sequenced gene is known, the annotation of the ortholog can give reliable information about the function and the role of the new gene. The main focus of the work was to improve existing and develop new approaches for the inference of orthology. We developed a novel method, called ortholog bootstrapping, to analyze a gene tree for orthologs. Instead of only assigning orthology from a single gene tree, ortholog bootstrapping analyses multiple trees calculated for the same gene family. The trees are reconstructed using the bootstrap technique, enabling us to calculate bootstrap support values for orthologous sequence pairs. Ortholog bootstrapping was then used to find orthologs between species with completely sequenced genomes. Here we employed a scheme for the hierarchical clustering of species based on their evolutionary history. The orthology inference was performed on the domain level, using the Pfam domain definitions. The results of the analysis were compared to a tree reconciliation method using a complete species tree for orthology inference. The comparison was based on a testset of Putative orthologous proteins with experimentally characterized functional properties. The outcome of the comparison showed that our approach increases the sensitivity for assigning orthologs from a gene tree. Orthologous relations found using our approach were stored in a database. The database is available over the Internet, accessible by a previously developed Java applet for visualizing phylogenetic relations between domains. In addition to inferring orthology by phylogenetic means we developed a pairwise sequence similarity based method for assigning orthology. It focuses on the correct separation of paralogs and the calculation of an orthology confidence value.
HÄR KAN DU HÄMTA AVHANDLINGEN I FULLTEXT. (följ länken till nästa sida)