Manifold Learning in Computational Biology

Detta är en avhandling från Centre for Mathematical Sciences, Lund University

Sammanfattning: This thesis deals with manifold learning techniques and their application in gene expression data analysis. Manifold learning is the study of methods that aim to infer geometrical structure from data sampled from manifolds, enabling nonlinear solutions to various machine learning tasks. Gene expression data analysis is the analysis of measurements of the abundance of gene products from a set of genes in the cell, which, by the use of microarray technology, can include the whole genome. Since the expression of one gene is dynamically linked to the expression of others, it is reasonable to assume that such expression data exhibits nonlinear structure, why it would be natural to approach its analysis using nonlinear methods, such as manifold learning.

Within the methodological development of manifold learning this thesis presents a method for robust estimation of geodesic distances (paper I), and a method for supervised manifold learning based on kernel dimension reduction (paper II). An extension of the latter algorithm to partitioned data is also presented. Further, a method for variable importance assessment in manifold learning is proposed (paper IV).

Within gene expression data analysis, results are presented that demonstrates better performance of manifold learning methods compared to linear methods in visualization of microarray samples (paper III). It is also demonstrated how genes can be ranked according to their influence on the observed structure in such nonlinear representations (paper IV). Finally, it is shown how biologically relevant gene/gene similarity measures can be obtained using unsupervised and supervised manifold learning (paper V).