Marinka Žitnik (2012) A Matrix Factorization Approach for Inference of Prediction Models from Heterogeneous Data Sources. Prešeren awards for students.
Abstract
Today we are witnessing rapid growth of data both in quantity and variety in all areas of human endeavour. Integrative treatment of these sources of information is a major challenge. We propose a new computation method for inference of prediction models. The method uses symmetric penalized matrix tri-factorization and prioritizes predictions by estimating probabilities from matrix factors. The approach represents a new concept of data integration by intermediate strategy which is both generally applicable as well as highly effective and reliable. Major advantages of the approach are an elegant mathematical formulation of the problem, ability to integrate any kind of data that can be expressed in matrix form, and high predictive accuracy. We tested the effectiveness of the proposed method on predicting gene annotations of social amoebae D. discoideum. The developed model integrates gene expressions, protein-protein interactions and known gene annotations. Model, inferred by proposed method, achieves higher accuracy than standard techniques of early and late integration, which combine inputs and predictions, respectively, and have in the past been favourably reported for their accuracy. With the proposed approach we have also predicted that there are a few genes of D. discoideum that may have a role in bacterial resistance and which were previously not associated with this function. Amoebae is an important model organism, also known for its predation of bacteria, among which are some dangerous to humans and have recently been increasingly resistant to developed antibiotics. Until now, only a handful of genes were known to participate in related bacterial recognition pathways of amoebae. Our predictions of five new genes were experimentally confirmed in wet-lab experiments at the collaborating institution (Baylor College of Medicine, Houston, USA). Expanding the list of such genes is crucial in the studies of mechanisms for bacterial resistance and can contribute to the research in development of alternative antibacterial therapy.
Item Type: | Thesis (Prešeren awards for students) |
Keywords: | matrix factorization, heterogeneous data sources, data fusion, prediction model, gene annotation, bacterial resistance |
Number of Pages: | 99 |
Language of Content: | Slovenian |
Mentor / Comentors: | Name and Surname | ID | Function |
---|
prof. dr. Blaž Zupan | 106 | Mentor |
|
Link to COBISS: | http://www.cobiss.si/scripts/cobiss?command=search&base=51012&select=(ID=9562452) |
Institution: | University of Ljubljana |
Department: | Faculty of Computer and Information Science |
Item ID: | 3697 |
Date Deposited: | 21 Dec 2016 10:43 |
Last Modified: | 10 Feb 2017 08:23 |
URI: | http://eprints.fri.uni-lj.si/id/eprint/3697 |
---|
Actions (login required)