Nina Mrzelj (2016) Deep learning on genomic and phylogenetic data. EngD thesis.
Abstract
Deep learning methods have been achieving amazing results in solving a variety of problems in many different fields, a very important one of them being genomics. In the thesis, deep learning methods have been used to classify bacterial DNA sequences into taxonomic ranks. The goal was to build a classification model based on the bacteria's 16S rRNA sequence and classify a bacteria by phylum, class, order, family and genus. The performance of five different models has been compared in terms of accuracy and F1 score. A model with convolutional neural networks, simple recurrent neural network, bidirectional neural network, a hybrid model that combines convolutional and neural network and a model using random forests have been built. Two experiments have been conducted. In the first one classification was based on the whole sequence. In the second one only a small sequence fragment was used. We evaluated the performance of the models based on two datasets of different sizes. Results show that convolutional neural networks outperformed other models in all the cases.
Actions (login required)