Confidence Predictions in Pharmaceutical Sciences

Sammanfattning: The main focus of this thesis has been on Quantitative Structure Activity Relationship (QSAR) modeling using methods producing valid measures of uncertainty. The goal of QSAR is to prospectively predict the outcome from assays, such as ADMET (Absorption, Distribution, Metabolism, Excretion), toxicity and on- and off-target interactions, for novel compounds. QSAR modeling offers an appealing alternative to laboratory work, which is both costly and time-consuming, and can be applied earlier in the development process as candidate drugs can be tested in silico without requiring to synthesize them first. A common theme across the presented papers is the application of conformal and probabilistic prediction models, which are used in order to associate predictions with a level of their reliability – a desirable property that is essential in the stage of decision making. In Paper I we studied approaches on how to utilize biological assay data from legacy systems, in order to improve predictive models. This is otherwise problematic since mixing data from separate systems will cause issues for most machine learning algorithms. We demonstrated that old data could be used to augment the proper training set of a conformal predictor to yield more efficient predictions while preserving model calibration. In Paper II we studied a new approach of predicting metabolic transformations of small molecules based on transformations encoded in SMIRKS format. In this work use used the probabilistic Cross-Venn-ABERS predictor which overall worked well, but had difficulty in modeling the minority class of imbalanced datasets. In Paper III we studied metabolomics data from patients diagnosed with Multiple Sclerosis and found a set of 15 discriminatory metabolites that could be used to classify patients from a validation cohort into one of two sub types of the disease with high accuracy. We further demonstrated that conformal prediction could be useful for tracking the progression of the disease for individual patients, which we exemplified using data from a clinical trial. In Paper IV we introduced CPSign – a software for cheminformatics modeling using conformal and probabilistic methods. CPSign was compared against other regularly used methods for this task, using 32 benchmark datasets, demonstrating that CPSign produces predictive accuracy on par with the best performing methods.

  KLICKA HÄR FÖR ATT SE AVHANDLINGEN I FULLTEXT. (PDF-format)