Klemen Kadunc (2016) Using machine learning for sentiment analysis of Slovene web commentaries. EngD thesis.
Abstract
The purpose of this work is to develop a tool for sentiment analysis of user comments. Several machine learning classifiers were tested and multinomial naive Bayes turned out to be the best predictor. We tried several preprocessing techniques, especially those for web texts. The classifier was improved with a Slovene sentiment lexicon, which is a list of words and set phrases with a positive and a negative connotation. An English sentiment lexicon was manually translated into Slovene. The analysed corpus of user comments was manually annotated by three annotators; its entries were selected from some of the most visited Slovene news portals. Both the lexicon and the annotated corpus of user comments are the main contributions of this work.
Item Type: | Thesis (EngD thesis) |
Keywords: | sentiment analysis, machine learning, opinion mining, natural language processing, classification, annotating text, opinion lexicon, Slovenian language, text preprocessing, user generated content |
Number of Pages: | 116 |
Language of Content: | Slovenian |
Mentor / Comentors: | Name and Surname | ID | Function |
---|
izr. prof. dr. Marko Robnik Šikonja | 276 | Mentor |
|
Link to COBISS: | http://www.cobiss.si/scripts/cobiss?command=search&base=51012&select=(ID=1536881859) |
Institution: | University of Ljubljana |
Department: | Faculty of Computer and Information Science |
Item ID: | 3317 |
Date Deposited: | 06 Apr 2016 11:43 |
Last Modified: | 21 Apr 2016 09:29 |
URI: | http://eprints.fri.uni-lj.si/id/eprint/3317 |
---|
Actions (login required)