Bioinformatics, evolution and revolution in our understanding of toxin-antitoxin systems

Sammanfattning: Bacteria experience a wide range of natural challenges during their life cycles, to which they must respond and adapt to live. Under stressed conditions such as amino acid starvation, bacteria slow down their growth mechanism by producing small alarmone nucleotides guanosine pentaphosphate (pppGpp) and tetraphosphate (ppGpp), collectively referred to as (p)ppGpp. Accumulation of (p)ppGpp results in a comprehensive alteration in cellular metabolism. The alarmone (p)ppGpp is produced and degraded by enzymes belonging to the RelA/SpoT Homologue (RSH) protein family, named for their sequence similarity to the RelA and SpoT enzymes of Escherichia coli. The members of the RSH protein family can be classified into long multi-domain RSHs and short single-domain RSHs. Long RSH enzymes such as RelA and SpoT, carry both the (p)ppGpp hydrolysis (HD) domain and (p)ppGpp synthesis (SYNTH) domain in the N-terminal domain enzymatic region (NTD), in combination with additional domains in the C-terminal domain regulatory region (CTD). Short single-domain RSHs can be divided into small alarmone hydrolases (SAHs) carrying the HD domain, and small alarmone synthetases (SASs) that only have the SYNTH domain. At the beginning of my PhD, I studied the diversity of RSH proteins across the tree of life. I identified 35,615 RSH proteins from analyses of 24,072 genomes. I used large-scale phylogenetic analyses to classify the RSH proteins into 13 long RSHs, 11 SAHs and 30 SASs subfamilies. To address why bacteria often carry multiple SASs in the same genome and predict new functions, I developed a computational tool called FlaGs – standing for Flanking Genes – to analyse the conservation of genomic neighbourhoods in large datasets. I also developed a web-based version of FlaGs, called webFlaGs, that is publicly accessible and is used by biologists all over the world. The application of FlaGs to SAS RSHs led to the discovery that multiple SAS subfamilies are encoded in conserved and frequently overlapping two or three-gene architecture, reminiscent of toxin−antitoxin (TA) systems. Five SAS representatives from the FaRel, FaRel2, PhRel, PhRel2 and CapRel subfamilies were experimentally validated as toxins (toxSASs) that are neutralised by the products of six neighbouring antitoxin genes. The toxSAS enzyme FaRel from Cellulomonas marina is encoded as the central gene of a conserved three-gene architecture and acts through the production of nucleotide alarmone ppGpp and its unusual toxic analogue ppApp, causing a significant depletion of ATP and GTP. FaRel toxicity can be countered by both downstream and upstream cognate antitoxins. The latter contains a SAH domain that neutralises toxicity through degradation of ppGpp as well as ppApp. Combining phylogenetic and FlaGs analyses we have discovered that the DUF4065 domain of unknown function is a widely distributed antitoxin domain in putative TA-like operons with dozens of distinct toxic domains. Nine DUF4065-containing antitoxins and their cognate toxins were experimentally validated as TA pairs using toxicity neutralisation assays. These antitoxins form complexes with their diverse cognate toxins. Given the versatility of DUF4065, we have renamed this domain Panacea. We hypothesise that there are multiple hyperpromiscuous antitoxins like Panacea that can be associated with many non-homologous toxin domains, which may also be hyperpromiscuous. Thus, TA systems across all bacteria can be represented as a network of toxin and antitoxin domain combinations, with hyperpromiscuous domains being hubs in the network. To test this and compute a network, I developed a new, iterative version of FlaGs, called NetFlax (standing for Network-FlaGs for toxins and antitoxins), that can identify TA-like gene architectures in an unsupervised manner and generate a toxin-antitoxin domain interaction network. The results from NetFlax verify our strategy since we have rediscovered multiple previously characterised TAs as well as brand new ones. We find that Panacea is unusual but not unique in its hyperpromiscuity and our network indicates the presence of novel hyperpromiscuous domains still to be explored. Our findings also demonstrate how a core network of TAs is evolutionarily linked to various accessory genome systems, including conjugative transfer and phage defence mechanisms. The existing network can potentially be a framework on which future discoveries of the biological role of TAs can be mapped. 

  KLICKA HÄR FÖR ATT SE AVHANDLINGEN I FULLTEXT. (PDF-format)