ePrints.FRI - University of Ljubljana, Faculty of Computer and Information Science

Automated authorship attribution for Slovenian literary texts

Ines Panker (2012) Automated authorship attribution for Slovenian literary texts. EngD thesis.

Download (1762Kb)


    Automatic authorship attribution is an umbrella term for methods trying to derive authorship from text. To achieve this they make use of various data mining techniques. Our chosen task was to test the successfulness of such procedures on a subset of Slovenian literary texts. Each text was represented as a vector with dimensions corresponding to the attributes we decided to measure. We started the calculations by measuring the number of punctuations and continued by measuring the number of word occurrences. We relied on the simple and most known classificators, we tested the SVM, kNN, classification trees and naive Bayes classificator. The last one was found to be giving the best results. Our final results were very satisfactory, with rudimentary approaches we achieved a classification accuracy of 78% and an average precision of 87% with 2 thirds of the authors having precision at 100%.

    Item Type: Thesis (EngD thesis)
    Keywords: data mining, authorship attribution, naive Bayes classifier
    Number of Pages: 49
    Language of Content: Slovenian
    Mentor / Comentors:
    Name and SurnameIDFunction
    prof. dr. Janez Demšar257Mentor
    Link to COBISS: http://www.cobiss.si/scripts/cobiss?command=search&base=50070&select=(ID=00009174100)
    Institution: University of Ljubljana
    Department: Faculty of Computer and Information Science
    Item ID: 1689
    Date Deposited: 11 May 2012 16:56
    Last Modified: 25 May 2012 18:00
    URI: http://eprints.fri.uni-lj.si/id/eprint/1689

    Actions (login required)

    View Item