ePrints.FRI - University of Ljubljana, Faculty of Computer and Information Science

Using machine learning for placing comma in Slovene

Anja Krajnc (2015) Using machine learning for placing comma in Slovene. EngD thesis.

Download (444Kb)


    We aim to learn comma placing using machine learning. Our approach is¸based on adding new attributes created from grammatical rules for the Slovenian language, which provides more information and thus enable better learning, i.e., higher precision and recall. We focus on placing all the commas in the text. We modify an existing research with additional learning methods, different parameters, undersampling and knowledge based attributes. We use corpus Šolar and improved corpus Šolar for testing and machine learning toolkit WEKA. Best results were achieved with random forests, alternating decision tree and decision table models.

    Item Type: Thesis (EngD thesis)
    Keywords: natural language processing, language manipulation, Slovenian language, comma, punctuation mark, language technologies, random forest, SVM, cross-validation, undersampling, machine learning
    Number of Pages: 79
    Language of Content: Slovenian
    Mentor / Comentors:
    Name and SurnameIDFunction
    izr. prof. dr. Marko Robnik Šikonja276Mentor
    Link to COBISS: http://www.cobiss.si/scripts/cobiss?command=search&base=51012&select=(ID=1536396995)
    Institution: University of Ljubljana
    Department: Faculty of Computer and Information Science
    Item ID: 3001
    Date Deposited: 29 Jun 2015 14:27
    Last Modified: 12 Aug 2015 10:24
    URI: http://eprints.fri.uni-lj.si/id/eprint/3001

    Actions (login required)

    View Item