Novel methods to study genomic fragility and structural variation

Sammanfattning: DNA double-strand breaks (DSBs) are major DNA lesions that when repaired unfaithfully can give rise to loss of genetic information, chromosomal rearrangements such as insertions/deletions (indels) and copy number alterations (CNAs), which in turn lead to genomic instability that is characteristic of almost all cancer types. In this context, it is thought that genomic instability has critical roles in cancer initiation, progression and intra-tumor heterogeneity (ITH). DSBs have also been exploited for genome-editing purposes where using different CRISPR (clustered regularly interspaced short palindromic repeats) systems, one can create DSBs in the target DNA to alter sequences and modify gene function. However, this approach is not without drawbacks, as DSBs can be created at sites other than the intended target (known as off-target effects), which can potentially be mutagenic. Therefore, given the importance of DSBs in genomic instability, their role in generation of CNAs and genome-editing technologies, it is of great interest to determine genomic locations of DSBs and their frequency along the genome, together with DNA copy number profiling. Thus, the focus of this thesis was to develop molecular tools for detection and quantification of DSBs with single-nucleotide resolution in different model systems, in combination with the development of technologies for DNA copy number profiling, by which we can collectively understand the biology behind DSBs, their links to CNAs in the context of cancer and assess the safety profile of CRISPR systems for therapeutic applications. In Paper I, we developed BLISS (Breaks Labeling In Situ and Sequencing) as a quantitative method enabling genome-wide DSB profiling. We showed that BLISS accurately identified both endogenous and drug-induced DSBs genome-wide, even in samples of a few thousand cells and in single tissue sections. Additionally, we demonstrated that BLISS is a powerful tool to measure the off-target activities of Cas9 and Cpf1 CRISPR systems, and indeed we found that Cpf1 was more specific than Cas9. In Paper II, using BLISS-generated DSB data from cell lines, we modeled the contribution of genetic and epigenetic features in shaping the cancer fragility, and made predictions of the frequency of expected breaks across the human genome. We constructed random forest regression models from four DSB datasets and found that the most influential feature in DSB frequency prediction is replication timing across all models. In addition, we noticed that open chromatin at transcriptionally active genes and associated regulatory factors have the largest influence on the frequency of DSBs than transcription per se. In Paper III, we developed CUTseq, which builds on the design of BLISS from Paper I, and can be used for gDNA barcoding and amplification to generate multiplexed DNA sequencing libraries for performing reduced representation sequencing of DNA samples extracted from cell lines, FFPE tissue sections or small sub-regions thereof. We demonstrated the applicability of CUTseq for CNA profiling, and showed that CUTseq can reproducibly detect a considerable fraction of high-confidence single nucleotide variants (SNVs) that were also detected by a standard exome capture method. Finally, we demonstrated that CUTseq can be applied for multi-region tumor sequencing to assess ITH of CNA profiles of multiple-small regions of a single FFPE tissue sections of primary and metastatic breast cancer lesions.

  Denna avhandling är EVENTUELLT nedladdningsbar som PDF. Kolla denna länk för att se om den går att ladda ner.