Determination of transcription factor binding specificities

Sammanfattning: The term "genetic code" refers to the way in which the information encoded in nucleic acids is converted into the amino-acid sequence of proteins. There is however also a second genetic code, one that is used by the cells to read the blueprints of the entire organism. This second genetic code is composed of gene regulatory information that specifies how much of a gene product should be made when and where. This information is read primarily by sequence specific DNA binding proteins called transcription factors (TFs). TFs recognize and bind short DNA sequences that are located in the regions of DNA that are either just adjacent or relatively close to their target genes. When bound to these sites, TFs directly regulate transcription rates by recruiting the general transcription machinery, or by inhibiting its recruitment. Alternatively, TFs can influence transcription rates indirectly by recruiting proteins that will change the local chromatin environment in a way that will promote or inhibit transcription. Each TF has its target specificity, it binds to a range of similar sequences that can be ranked based on their relative binding strengths. A major gap in our understanding of life is the lack of knowledge of the TF DNA binding-specificities. While we have good estimates of the total number of TFs and their general types, we do not yet understand the way in which the gene regulatory instructions are encoded in the genome. To approach this important question, we first need to know which DNA sequences TFs bind and how strongly. The aim of this thesis project was to develop efficient methods for the characterization of TF binding specificities and then use these methods to catalogue DNA-binding specificities of as many human TFs as possible. In Study I, we converted the classical Systematic Evolution of Ligands by Exponential Enrichment (SELEX) assay into a high throughput compatible method (HT-SELEX) and showcased the method by analyzing DNA binding specificities of 18 TFs representing 14 structural classes. Some of the results were validated by in vivo results from chromatin immunoprecipitation assays. In Study II, we used HT-SELEX to analyze the binding specificities for clones representing almost all human TFs, generating a dataset of high resolution DNA binding specificity models for more mammalian TFs than in the entire previously published literature combined. Another major feature of our dataset is its high consistency, which was achieved by performing all of the experiments in parallel with the same method. In Study III we studied evolution of gene regulation by analyzing the DNA binding specificities of TFs from the fruit fly Drosophila melanogaster. Analysis showed that even though the common ancestor of human and insects lived over 600 million years ago, the TF binding-specificities were very conserved between these species and there were similar counterparts to almost all of the TFs in either of the species.

  Denna avhandling är EVENTUELLT nedladdningsbar som PDF. Kolla denna länk för att se om den går att ladda ner.