Tomislav Slijepčević (2018) Deep Models for Classification of Biomedical Documents. MSc thesis.
Abstract
In this master thesis, we developed a model that can present texts from life sciences in the vector form that is suitable for machine learning. Our corpus were abstracts from the MEDLINE collection, where abstracts are labeled with annotations from the MeSH ontology. The developed model uses a deep neural network for predicting MeSH annotations from a text. For the vector representation of a text, we used penultimate layer of a network that has 1000 neurons. The model was compared to the multinomial logistic regression, which predicts MeSH annotations from vector representations of texts that are obtained with doc2vec. In the task of predicting MeSH annotations on the test dataset, our model achieved higher accuracy. Also, vector representations of texts obtained with our model were in comparison with vector representations of texts obtained with doc2vec, better in point-based visualizations using the t-SNE method.
Actions (login required)