Low-rank matrix factorization in multiple kernel learning

Martin Stražar (2018) Low-rank matrix factorization in multiple kernel learning. PhD thesis.

Preview

Abstract

The increased rate of data collection, storage, and availability results in a corresponding interest for data analyses and predictive models based on simultaneous inclusion of multiple data sources. This tendency is ubiquitous in practical applications of machine learning, including recommender systems, social network analysis, finance and computational biology. The heterogeneity and size of the typical datasets calls for simultaneous dimensionality reduction and inference from multiple data sources in a single model. Matrix factorization and multiple kernel learning models are two general approaches that satisfy this goal. This work focuses on two specific goals, namely i) finding interpretable, non-overlapping (orthogonal) data representations through matrix factorization and ii) regression with multiple kernels through the low-rank approximation of the corresponding kernel matrices, providing non-linear outputs and interpretation of kernel selection. The motivation for the models and algorithms designed in this work stems from RNA biology and the rich complexity of protein-RNA interactions. Although the regulation of RNA fate happens at many levels - bringing in various possible data views - we show how different questions can be answered directly through constraints in the model design. We have developed an integrative orthogonality nonnegative matrix factorization (iONMF) to integrate multiple data sources and discover nonoverlapping, class-specific RNA binding patterns of varying strengths. We show that the integration of multiple data sources improves the predictive accuracy of retrieval of RNA binding sites and report on a number of inferred protein-specific patterns, consistent with experimentally determined properties. A principled way to extend the linear models to non-linear settings are kernel methods. Multiple kernel learning enables modelling with different data views, but are limited by the

Item Type:

Thesis (PhD thesis)

Keywords:

Machine learning, bioinformatics, matrix factorization, kernel methods, multiple kernel learning, linear regression, protein-RNA interactions.

Number of Pages:

162

Language of Content:

English

Mentor / Comentors:

Name and Surname	ID	Function
doc. dr. Tomaž Curk	299	Mentor

Link to COBISS:

http://www.cobiss.si/scripts/cobiss?command=search&base=51012&select=(ID=1537959363)

Institution:

University of Ljubljana

Department:

Faculty of Computer and Information Science

Item ID:

4258

Date Deposited:

26 Sep 2018 17:36

Last Modified:

08 Oct 2018 11:05

URI:

http://eprints.fri.uni-lj.si/id/eprint/4258

Actions (login required)

View Item