Analysis of large-scale metagenomic data
Sammanfattning: The topic of this thesis is the analysis of large data sets of DNA sequence data produced from modern high-throughput DNA sequencing machines. Using such machines to sequence the genetic content of a microbial community produces a metagenome. This thesis comprises three research papers, all connected to the study of large metagenomic data sets. In the first paper, we developed a method for discovering fragments of fluoroquinolone antibiotic resistance genes in short fragments of DNA. The method uses hidden Markov models for identifying qnr genes in short DNA fragments. Cross-validation showed that our method for classifying short fragments has high statistical power even for fragments as short as 100 base pairs, a length commonly encountered in modern next-generation sequencing data. In the second paper, the putative qnr genes identified in the first paper were verified using wet-lab experiments. This was a follow-up study to validate the findings from the first paper. An expression system for qnr genes in Escherichia coli hosts was developed and used to evaluate the resistance phenotype of the novel gene candidates discovered in the first paper. In the third paper, we developed an easy-to-use high performance method for distributed gene quantification in metagenomic sequence data. It leverages high-performance computing resources to provide high throughput while maintaining sensitivity. This enables efficient and accurate gene quantification, suitable for use in comparative metagenomics. Next-generation DNA sequencing has had a big impact on molecular biology. As the size of the produced data sets increases, there is an equally increasing need for methods suited for the analysis of such data sets. This thesis presents several new methods that are well adapted to analysis of modern terabase-sized metagenomic data sets.
Denna avhandling är EVENTUELLT nedladdningsbar som PDF. Kolla denna länk för att se om den går att ladda ner.