ePrints.FRI - University of Ljubljana, Faculty of Computer and Information Science

Deep learning on genomic and phylogenetic data

Nina Mrzelj (2016) Deep learning on genomic and phylogenetic data. EngD thesis.

Download (656Kb)


    Deep learning methods have been achieving amazing results in solving a variety of problems in many different fields, a very important one of them being genomics. In the thesis, deep learning methods have been used to classify bacterial DNA sequences into taxonomic ranks. The goal was to build a classification model based on the bacteria's 16S rRNA sequence and classify a bacteria by phylum, class, order, family and genus. The performance of five different models has been compared in terms of accuracy and F1 score. A model with convolutional neural networks, simple recurrent neural network, bidirectional neural network, a hybrid model that combines convolutional and neural network and a model using random forests have been built. Two experiments have been conducted. In the first one classification was based on the whole sequence. In the second one only a small sequence fragment was used. We evaluated the performance of the models based on two datasets of different sizes. Results show that convolutional neural networks outperformed other models in all the cases.

    Item Type: Thesis (EngD thesis)
    Keywords: deep learning, classification, neural networks
    Number of Pages: 35
    Language of Content: Slovenian
    Mentor / Comentors:
    Name and SurnameIDFunction
    prof. dr. Blaž Zupan106Mentor
    Link to COBISS: http://www.cobiss.si/scripts/cobiss?command=search&base=51012&select=(ID=1537217987)
    Institution: University of Ljubljana
    Department: Faculty of Computer and Information Science
    Item ID: 3573
    Date Deposited: 13 Sep 2016 14:36
    Last Modified: 19 Oct 2016 10:40
    URI: http://eprints.fri.uni-lj.si/id/eprint/3573

    Actions (login required)

    View Item