Urban Baumkirher (2017) Automatic image captioning using deep neural networks. EngD thesis.
Abstract
We implemented a deep neural network, which we trained to generate image captions. The neural network connects computer vision and natural language processing. We followed existing architectures for the same problem and implemented our architecture with Keras library in Python. We retrieved data from an online data collection MS COCO. Our solution implements a bimodal architecture and uses deep convolutional, recurrent and fully connected neural networks. For processing and collecting image features we used the VGG16 architecture. We used GloVe embeddings for word representation. The final model was trained on a collection of 82.783 and tested on 40.504 images and their descriptions. We evaluated the model with the BLEU score metric and obtained a value of 49.0 and classification accuracy of 60 %. Current state-of-the-art models were not surpassed, but we see many possibilities for improvements.
Item Type: | Thesis (EngD thesis) |
Keywords: | image captioning, machine learning, deep learning, neural networks, convolutional neural networks, recurrent neural networks, LSTM neural networks |
Number of Pages: | 46 |
Language of Content: | Slovenian |
Mentor / Comentors: | Name and Surname | ID | Function |
---|
izr. prof. dr. Marko Robnik Šikonja | 276 | Mentor |
|
Link to COBISS: | http://www.cobiss.si/scripts/cobiss?command=search&base=51012&select=(ID=1537499331) |
Institution: | University of Ljubljana |
Department: | Faculty of Computer and Information Science |
Item ID: | 3886 |
Date Deposited: | 29 Aug 2017 12:04 |
Last Modified: | 07 Sep 2017 10:16 |
URI: | http://eprints.fri.uni-lj.si/id/eprint/3886 |
---|
Actions (login required)