ePrints.FRI - University of Ljubljana, Faculty of Computer and Information Science

Assessment of text readability using statistical and machine learning approaches

Andrejaana Andova (2017) Assessment of text readability using statistical and machine learning approaches. EngD thesis.

Download (282Kb)


    This thesis describes a prototype of a system that evaluates the readability of a given text in Slovene. To estimate the readability of a text, we used two methods - regression and classification. The regression method returns a numerical estimation of the readability of a text expressed as years of education, while the classification method tries to classify the input into two classes, where one of the classes is defined as more readable and the other as less readable. We used the corpus Šolar as a training set and first estimated readability using statistical measures. Using features extracted from the texts, we trained different ML algorithms. To assess the quality of our prototypes, we used newspapers and magazines from ccGigafida corpus as a testing set.

    Item Type: Thesis (EngD thesis)
    Keywords: readability, natural language processing, machine learning
    Number of Pages: 31
    Language of Content: English
    Mentor / Comentors:
    Name and SurnameIDFunction
    izr. prof. dr. Marko Robnik Šikonja276Mentor
    Link to COBISS: http://www.cobiss.si/scripts/cobiss?command=search&base=51012&select=(ID=1537689283)
    Institution: University of Ljubljana
    Department: Faculty of Computer and Information Science
    Item ID: 4031
    Date Deposited: 21 Dec 2017 10:45
    Last Modified: 16 Jan 2018 11:14
    URI: http://eprints.fri.uni-lj.si/id/eprint/4031

    Actions (login required)

    View Item