Novel variable influence on projection (VIP) methods in OPLS, O2PLS, and OnPLS models for single- and multi-block variable selection VIPOPLS, VIPO2PLS, and MB-VIOP methods

Detta är en avhandling från Umeå : Umeå University

Sammanfattning: Multivariate and multiblock data analysis involves useful methodologies for analyzing large data sets in chemistry, biology, psychology, economics, sensory science, and industrial processes; among these methodologies, partial least squares (PLS) and orthogonal projections to latent structures (OPLS®) have become popular. Due to the increasingly computerized instrumentation, a data set can consist of thousands of input variables which contain latent information valuable for research and industrial purposes. When analyzing a large number of data sets (blocks) simultaneously, the number of variables and underlying connections between them grow very much indeed; at this point, reducing the number of variables keeping high interpretability becomes a much needed strategy.The main direction of research in this thesis is the development of a variable selection method, based on variable influence on projection (VIP), in order to improve the model interpretability of OnPLS models in multiblock data analysis. This new method is called multiblock variable influence on orthogonal projections (MB-VIOP), and its novelty lies in the fact that it is the first multiblock variable selection method for OnPLS models.Several milestones needed to be reached in order to successfully create MB-VIOP. The first milestone was the development of a single-block variable selection method able to handle orthogonal latent variables in OPLS models, i.e. VIP for OPLS (denoted as VIPOPLS or OPLS-VIP in Paper I), which proved to increase the interpretability of PLS and OPLS models, and afterwards, was successfully extended to multivariate time series analysis (MTSA) aiming at process control (Paper II). The second milestone was to develop the first multiblock VIP approach for enhancement of O2PLS® models, i.e. VIPO2PLS for two-block multivariate data analysis (Paper III). And finally, the third milestone and main goal of this thesis, the development of the MB-VIOP algorithm for the improvement of OnPLS model interpretability when analyzing a large number of data sets simultaneously (Paper IV).The results of this thesis, and their enclosed papers, showed that VIPOPLS, VIPO2PLS, and MB-VIOP methods successfully assess the most relevant variables for model interpretation in PLS, OPLS, O2PLS, and OnPLS models. In addition, predictability, robustness, dimensionality reduction, and other variable selection purposes, can be potentially improved/achieved by using these methods.