Retroviral long Terminal Repeats; Structure, Detection and Phylogeny

Detta är en avhandling från Uppsala : Acta Universitatis Upsaliensis

Sammanfattning: Long terminal repeats (LTRs) are non-coding repeats flanking the protein-coding genes of LTR retrotransposons. The variability of LTRs poses a challenge in studying them. Hidden Markov models (HMMs), probabilistic models widely used in pattern recognition, are useful in dealing with this variability. The aim of this work was mainly to study LTRs of retroviruses and LTR retrotransposons using HMMs.Paper I describes the methodology of HMM modelling applied to different groups of LTRs from exogenous retroviruses (XRVs) and endogenous retroviruses (ERVs). The detection capabilities of HMMs were assessed and were found to be high for homogeneous groups of LTRs. The alignments generated by the HMMs displayed conserved motifs some of which could be related to known functions of XRVs. The common features of the different groups of retroviral LTRs were investigated by combining them into a single alignment. They were the short inverted terminal repeats TG and CA and three AT-rich stretches which provide retroviruses with TATA boxes and AATAAA polyadenylation signals.In Paper II, phylogenetic trees of three groups of retroviral LTRs were constructed by using HMM-based alignments. The LTR trees were consistent with trees based on other retroviral genes suggesting co-evolution between LTRs and these genes.In Paper III, the methods in Paper I and II were extended to LTRs from other retrotransposon groups, covering much of the diversity of all known LTRs. For the first time an LTR phylogeny could be achieved. There were no major disagreement between the LTR tree and trees based on three different domains of the Pol gene. The conserved LTR structure of paper I was found to apply to all LTRs. Putative Integrase recognition motifs extended up to 12 bp beyond the short inverted repeats TG/CA.Paper IV is a review article describing the use of sequence similarity and structural markers for the taxonomy of ERVs. ERVs were originally classified into three classes according to the length of the target site duplication. While this classification is useful it does not include all ERVs. A naming convention based on previous ERV and XRV nomenclature but taking into account newer information is advocated in order to provide a practical yet coherent scheme in dealing with new unclassified ERV sequences.Paper V gives an overview of bioinformatics tools for studies of ERVs and of retroviral evolution before and after endogenization. It gives some examples of recent integrations in vertebrate genomes and discusses pathogenicity of human ERVs including their possible relation to cancers.In conclusion, HMMs were able to successfully detect and align LTRs. Progress was made in understanding their conserved structure and phylogeny. The methods developed in this thesis could be applied to different kinds of non-coding DNA sequence element.