ePrints.FRI - University of Ljubljana, Faculty of Computer and Information Science

Automatic image captioning using deep neural networks

Urban Baumkirher (2017) Automatic image captioning using deep neural networks. EngD thesis.

Download (5Mb)


    We implemented a deep neural network, which we trained to generate image captions. The neural network connects computer vision and natural language processing. We followed existing architectures for the same problem and implemented our architecture with Keras library in Python. We retrieved data from an online data collection MS COCO. Our solution implements a bimodal architecture and uses deep convolutional, recurrent and fully connected neural networks. For processing and collecting image features we used the VGG16 architecture. We used GloVe embeddings for word representation. The final model was trained on a collection of 82.783 and tested on 40.504 images and their descriptions. We evaluated the model with the BLEU score metric and obtained a value of 49.0 and classification accuracy of 60 %. Current state-of-the-art models were not surpassed, but we see many possibilities for improvements.

    Item Type: Thesis (EngD thesis)
    Keywords: image captioning, machine learning, deep learning, neural networks, convolutional neural networks, recurrent neural networks, LSTM neural networks
    Number of Pages: 46
    Language of Content: Slovenian
    Mentor / Comentors:
    Name and SurnameIDFunction
    izr. prof. dr. Marko Robnik Šikonja276Mentor
    Link to COBISS: http://www.cobiss.si/scripts/cobiss?command=search&base=51012&select=(ID=1537499331)
    Institution: University of Ljubljana
    Department: Faculty of Computer and Information Science
    Item ID: 3886
    Date Deposited: 29 Aug 2017 12:04
    Last Modified: 07 Sep 2017 10:16
    URI: http://eprints.fri.uni-lj.si/id/eprint/3886

    Actions (login required)

    View Item