Managing and Exploring Large Data Sets Generated by Liquid Separation - Mass Spectrometry

Detta är en avhandling från Uppsala : Acta Universitatis Upsaliensis

Sammanfattning: A trend in natural science and especially in analytical chemistry is the increasing need for analysis of a large number of complex samples with low analyte concentrations. Biological samples (urine, blood, plasma, cerebral spinal fluid, tissue etc.) are often suitable for analysis with liquid separation mass spectrometry (LS-MS), resulting in two-way data tables (time vs. m/z). Such biological 'fingerprints' taken for all samples in a study correspond to a large amount of data. Detailed characterization requires a high sampling rate in combination with high mass resolution and wide mass range, which presents a challenge in data handling and exploration. This thesis describes methods for managing and exploring large data sets made up of such detailed 'fingerprints' (represented as data matrices).The methods were implemented as scripts and functions in Matlab, a wide-spread environment for matrix manipulations. A single-file structure to hold the imported data facilitated both easy access and fast manipulation. Routines for baseline removal and noise reduction were intended to reduce the amount of data without loosing relevant information. A tool for visualizing and exploring single runs was also included. When comparing two or more 'fingerprints' they usually have to be aligned due to unintended shifts in analyte positions in time and m/z. A PCA-like multivariate method proved to be less sensitive to such shifts, and an ANOVA implementation made it easier to find systematic differences within the data sets.The above strategies and methods were applied to complex samples such as plasma, protein digests, and urine. The field of application included urine profiling (paracetamole intake; beverage effects), peptide mapping (different digestion protocols) and search for potential biomarkers (appendicitis diagnosis) . The influence of the experimental factors was visualized by PCA score plots as well as clustering diagrams (dendrograms).