ePrints.FRI - University of Ljubljana, Faculty of Computer and Information Science

Empirical evaluation of automatic sentiment classification process in financial domain

Sašo Rutar (2016) Empirical evaluation of automatic sentiment classification process in financial domain. EngD thesis.

[img]
Preview
PDF
Download (1478Kb)

    Abstract

    In this thesis, we explore several specific aspects of Twitter sentiment analysis. Our system for sentiment analysis is based on machine learning and text mining techniques, such as the bag-of-words representation of texts and support vector machine classifier. We employ our system to analyze a stream of short messages (tweets) about financial markets, specifically about stock trading, in the time span of two years. We classify each message into positive, negative, or neutral class, which represent the sentiment or stance towards the stock mentioned in the message. The term sentiment in our case thus denotes the stance of the author (speaker) and in the case of positive or negative class represents the author’s leaning towards buying or selling the stock. To build the classification model, we employ a relatively large gold standard which consists of approximately a half million tweets hand-labeled by the domain experts. For the purpose of this analysis, we developed an evaluation platform and a methodology that allow us, by conducting a series of experiments, to answer various questions which arise when applying sentiment analysis in industrial settings. In the evaluation processes, we take the temporal nature of the data into account and thus enable continuous monitoring of performance of live systems. The results of the analysis reveal (i) the most appropriate classification algorithm, (ii) the optimal size of the labeled data and subsampling method, (iii) the relationship between the classifier performance and the time lag from the training data, and (iv) the effect of duplicated tweets (e.g., retweets), and (v) the behavior of the employed classification method in the uncertainty area near the hyper-plane of support vector machine classifier.

    Item Type: Thesis (EngD thesis)
    Keywords: sentiment analysis, machine learning, opinion mining, Twitter, natural language processing, classification with support vector machine, empirical evaluation, financial trading, stocks
    Number of Pages: 70
    Language of Content: Slovenian
    Mentor / Comentors:
    Name and SurnameIDFunction
    izr. prof. dr. Marko Robnik Šikonja276Mentor
    doc. dr. Igor MorzetičComentor
    Link to COBISS: http://www.cobiss.si/scripts/cobiss?command=search&base=51012&select=(ID=1537049283)
    Institution: University of Ljubljana
    Department: Faculty of Computer and Information Science
    Item ID: 3388
    Date Deposited: 05 Jul 2016 10:41
    Last Modified: 03 Aug 2016 09:29
    URI: http://eprints.fri.uni-lj.si/id/eprint/3388

    Actions (login required)

    View Item