Transmembrane Proteins and Protein Structure Prediction : What we can learn from Computational Methods

Sammanfattning: A protein’s 3D-structure is essential to understand how proteins function and interact and how biochemical processes proceed in organic life. Despite the advancement in experimental methods, it remains expensive and time-consuming to determine protein structure experimentally. There have been significant advances in machine learning and computational methods where, in many cases, models of protein structure can be determined to a high level of quality. Using computational methods helps predict protein 3D-structure and is often used complementary to experimental methods to give better insight and understanding of biological processes.This thesis presents studies focusing on the simplicity and transparency of the 3D-structure pipeline. This is done with a new interactive database with full access to the pipeline’s data and code together with tools to analyse and compare models and structures. I present a new module for the last step in this pipeline, the final folding of the protein chain, which both simplifies the current pipeline and uses new input data based on the current research. This module predicts better models than its predecessor and produces models more than a magnitude faster than the current state-of-the-art tools. This module also contains a novel way of both folding and docking dimers in one single step. There are many examples of how machine learning models contain biases that originate in biased training data, translating into models that do not generalise well. I present a study where experts collaborate to create a high-quality database of Intrinsically Disordered Proteins. Through manual annotation and quality protocols, high-quality training data has been produced that is well suited for machine learning tasks and protein disorder analysis. In this thesis, I also present computational methods pertaining to transmembrane proteins and how they can increase our insight into membrane protein structure. In one study, we use computational methods together with experimental methods to investigate how differently charged residue pairs that form salt bridges inside the membrane of membrane proteins changes the insertion potential. We show that amino acid pairs that form salt bridges in this setting contribute 0.5-0.7 kcal/mol to membrane insertion’s apparent free energy. This gives new insight and advances in how we calculate insertion and can lead to better membrane protein topology predictors. In the final study, we investigate the CPA/AT-transporter family of transmembrane proteins and create a new integrated topology annotation method and structural classification, resulting in new insight into how this family evolved through time. The entire pipeline is published as an interactive database with complete transparency for both the method and data used. The study shows how this family has evolved by duplicating internal regions and how this has caused a structural symmetry in the family. This thesis, therefore, contributes to a more accessible and more transparent path of using computational methods to give a more extensive insight into protein structure prediction and how these structures pertain to biochemical processes.

  KLICKA HÄR FÖR ATT SE AVHANDLINGEN I FULLTEXT. (PDF-format)