ePrints.FRI - University of Ljubljana, Faculty of Computer and Information Science

Using machine learning for sentiment analysis of Slovene web commentaries

Klemen Kadunc (2016) Using machine learning for sentiment analysis of Slovene web commentaries. EngD thesis.

Download (3707Kb)


    The purpose of this work is to develop a tool for sentiment analysis of user comments. Several machine learning classifiers were tested and multinomial naive Bayes turned out to be the best predictor. We tried several preprocessing techniques, especially those for web texts. The classifier was improved with a Slovene sentiment lexicon, which is a list of words and set phrases with a positive and a negative connotation. An English sentiment lexicon was manually translated into Slovene. The analysed corpus of user comments was manually annotated by three annotators; its entries were selected from some of the most visited Slovene news portals. Both the lexicon and the annotated corpus of user comments are the main contributions of this work.

    Item Type: Thesis (EngD thesis)
    Keywords: sentiment analysis, machine learning, opinion mining, natural language processing, classification, annotating text, opinion lexicon, Slovenian language, text preprocessing, user generated content
    Number of Pages: 116
    Language of Content: Slovenian
    Mentor / Comentors:
    Name and SurnameIDFunction
    izr. prof. dr. Marko Robnik Šikonja276Mentor
    Link to COBISS: http://www.cobiss.si/scripts/cobiss?command=search&base=51012&select=(ID=1536881859)
    Institution: University of Ljubljana
    Department: Faculty of Computer and Information Science
    Item ID: 3317
    Date Deposited: 06 Apr 2016 11:43
    Last Modified: 21 Apr 2016 09:29
    URI: http://eprints.fri.uni-lj.si/id/eprint/3317

    Actions (login required)

    View Item