Martin Stražar (2018) Low-rank matrix factorization in multiple kernel learning. PhD thesis.
Abstract
The increased rate of data collection, storage, and availability results in a corresponding interest for data analyses and predictive models based on simultaneous inclusion of multiple data sources. This tendency is ubiquitous in practical applications of machine learning, including recommender systems, social network analysis, finance and computational biology. The heterogeneity and size of the typical datasets calls for simultaneous dimensionality reduction and inference from multiple data sources in a single model. Matrix factorization and multiple kernel learning models are two general approaches that satisfy this goal. This work focuses on two specific goals, namely i) finding interpretable, non-overlapping (orthogonal) data representations through matrix factorization and ii) regression with multiple kernels through the low-rank approximation of the corresponding kernel matrices, providing non-linear outputs and interpretation of kernel selection. The motivation for the models and algorithms designed in this work stems from RNA biology and the rich complexity of protein-RNA interactions. Although the regulation of RNA fate happens at many levels - bringing in various possible data views - we show how different questions can be answered directly through constraints in the model design. We have developed an integrative orthogonality nonnegative matrix factorization (iONMF) to integrate multiple data sources and discover nonoverlapping, class-specific RNA binding patterns of varying strengths. We show that the integration of multiple data sources improves the predictive accuracy of retrieval of RNA binding sites and report on a number of inferred protein-specific patterns, consistent with experimentally determined properties. A principled way to extend the linear models to non-linear settings are kernel methods. Multiple kernel learning enables modelling with different data views, but are limited by the
Actions (login required)