ePrints.FRI - University of Ljubljana, Faculty of Computer and Information Science

A Matrix Factorization Approach for Inference of Prediction Models from Heterogeneous Data Sources

Marinka Žitnik (2012) A Matrix Factorization Approach for Inference of Prediction Models from Heterogeneous Data Sources. EngD thesis.

[img]
Preview
PDF
Download (1809Kb)

    Abstract

    Today we are witnessing rapid growth of data both in quantity and variety in all areas of human endeavour. Integrative treatment of these sources of information is a major challenge. We propose a new computation framework for inference of prediction models based on symmetric penalized matrix tri-factorization and intermediate strategy for data integration. Major advantages of the approach are an elegant mathematical formulation of the problem, an integration of any kind of data that can be expressed in matrix form, and high predictive accuracy. We tested the effectiveness of the proposed framework on predicting gene annotations of social amoebae D. dictyostelium. The developed model integrates gene expressions, protein-protein interactions and known gene annotations. The model achieves higher accuracy than standard techniques of early and late integration, which combine inputs and predictions, respectively, and have in the past been favourably reported for their accuracy. With the proposed approach we have also predicted that there is a set of genes of D. dictyostelium that may have a role in bacterial resistance and which were previously not associated with this function. Until now, only a handful of genes were known to participate in related bacterial recognition pathways. Expanding the list of such genes is crucial in the studies of mechanisms for bacterial resistance and can contribute to the research in development of alternative antibacterial therapy. Our predictions were experimentally confirmed in wet-lab experiments at the collaborating institution (Baylor College of Medicine, Houston, USA).

    Item Type: Thesis (EngD thesis)
    Keywords: matrix factorization, heterogeneous data sources, data fusion, prediction model, gene annotation, bacterial resistance
    Number of Pages: 98
    Language of Content: Slovenian
    Mentor / Comentors:
    Name and SurnameIDFunction
    prof. dr. Blaž Zupan106Mentor
    Link to COBISS: http://www.cobiss.si/scripts/cobiss?command=search&base=50070&select=(ID=00009341780)
    Institution: University of Ljubljana
    Department: Faculty of Computer and Information Science
    Item ID: 1775
    Date Deposited: 18 Jul 2012 14:32
    Last Modified: 06 Sep 2012 09:08
    URI: http://eprints.fri.uni-lj.si/id/eprint/1775

    Actions (login required)

    View Item