ePrints.FRI - University of Ljubljana, Faculty of Computer and Information Science

Generating Slovene word forms using machine learning

Rok Rejc (2017) Generating Slovene word forms using machine learning. EngD thesis.

[img]
Preview
PDF
Download (399Kb)

    Abstract

    Sloleks is a lexicon of Slovene word forms which contains - in a structured database - Slovene words and all their word forms, their word class and morphosyntactic properties. Due to constant changing of the language and the growing needs for machine processing, Sloleks must be constantly updated. The aim of the thesis was to create a tool using machine learning that will allow automated extension of lexicon of Slovene word forms Sloleks. We focused mainly on nouns, but the tool can also be used for other word classes such as verb or adjective. The problem was tackled with clustering of nouns into groups with similar morphosyntactic properties, where we used clustering around medoids. Based on the obtained groups which represent morphosyntactic paradigms, we build a model using naive Bayes classifier which predicts these paradigms for new words. For nouns from corpus ccGigafida, which have missing word forms, we predicted groups using build classifier and filled the paradigm with missing word form using typical representatives of classes. Approach was evaluated qualitatively and quantitatively.

    Item Type: Thesis (EngD thesis)
    Keywords: lexicon of word forms, Sloleks, morphosyntactic paradigms, machine learning, naive Bayes classifier, Partitioning Around Medoids
    Number of Pages: 45
    Language of Content: Slovenian
    Mentor / Comentors:
    Name and SurnameIDFunction
    izr. prof. dr. Marko Robnik Šikonja276Mentor
    doc. dr. Simon KrekComentor
    Link to COBISS: http://www.cobiss.si/scripts/cobiss?command=search&base=51012&select=(ID=1537389507)
    Institution: University of Ljubljana
    Department: Faculty of Computer and Information Science
    Item ID: 3833
    Date Deposited: 17 Mar 2017 16:05
    Last Modified: 23 Mar 2017 14:11
    URI: http://eprints.fri.uni-lj.si/id/eprint/3833

    Actions (login required)

    View Item