ePrints.FRI - University of Ljubljana, Faculty of Computer and Information Science

Classification of viral genomes using machine learning

Matej Kopar (2015) Classification of viral genomes using machine learning. EngD thesis.

Download (991Kb)


    In this diploma thesis our goal was to classify viral sequences into taxonomic groups by using different machine learning methods. We assembled the taxonomic structure by collecting data from NCBI web site. To clean the data we applied several filtering steps. We then evaluated the predictive performance of classical and structured machine learning methods on the task of classification in taxonomy groups. We wanted to determine the most suitable way to describe genomic sequences. Using k-mers to describe the genomic composition yielded poor predictive models, with best performance slightly above the performance of the majority classifier. Methods, which are able to use prior knowledge on the taxonomic relationships between classes, performed slightly better than methods, which did not use such information.

    Item Type: Thesis (EngD thesis)
    Keywords: machine learning, classification, support vector machine, random forest, viral sequences, structured machine learning
    Number of Pages: 43
    Language of Content: Slovenian
    Mentor / Comentors:
    Name and SurnameIDFunction
    doc. dr. Tomaž Curk299Mentor
    Link to COBISS: http://www.cobiss.si/scripts/cobiss?command=search&base=51012&select=(ID=1536600003)
    Institution: University of Ljubljana
    Department: Faculty of Computer and Information Science
    Item ID: 3153
    Date Deposited: 18 Sep 2015 17:07
    Last Modified: 27 Oct 2015 14:14
    URI: http://eprints.fri.uni-lj.si/id/eprint/3153

    Actions (login required)

    View Item