ePrints.FRI - University of Ljubljana, Faculty of Computer and Information Science

Word embeddings for detection of verbal idioms in Slovene

Tilen Zelinka (2019) Word embeddings for detection of verbal idioms in Slovene. EngD thesis.

Download (332Kb)


    Word embeddings map words to a high dimensional vector space, where words with similar meanings have similar vectors. We analyzed the problem of automatic identification of verbal idioms in Slovene using features built from embeddings of single words and groups of words. For this purpose, we built two data sets that contain verbal idioms and random word groups described with corresponding features. Using these data sets we evaluated the classification of verbal idioms with support vector machines, random forests, and logistic regression. All three methods were successful, the best being random forests. Due to large computational time and limitation to only identify groups of words with precomputed word embeddings the approach requires further improvements to be practically useful.

    Item Type: Thesis (EngD thesis)
    Keywords: natural language processing, word embeddings, multiword expressions, machine learning
    Number of Pages: 29
    Language of Content: Slovenian
    Mentor / Comentors:
    Name and SurnameIDFunction
    prof. dr. Marko Robnik Šikonja276Mentor
    Link to COBISS: http://www.cobiss.si/scripts/cobiss?command=search&base=51012&select=(ID=1538206659)
    Institution: University of Ljubljana
    Department: Faculty of Computer and Information Science
    Item ID: 4408
    Date Deposited: 20 Mar 2019 16:26
    Last Modified: 17 Apr 2019 10:09
    URI: http://eprints.fri.uni-lj.si/id/eprint/4408

    Actions (login required)

    View Item