ePrints.FRI - University of Ljubljana, Faculty of Computer and Information Science

Abstractive summarization for Slovene language

Andrej Jugovic (2016) Abstractive summarization for Slovene language. EngD thesis.

[img]
Preview
PDF
Download (700Kb)

    Abstract

    The thesis focuses on automatic summarization of Slovene documents. There are large numbers of documents in digital form which we want to summarize in order to make them accessible to humans. This cannot be done manually so we want to automate the process. Our system, uses a parser for Slovene language to find triplets consisting of a subject, predicate (or verb) and object. We build a graph using the words in the triplets and weight the connections. We rank the nodes with P-PR algorithm, which assesses the importance of words in triples. We weight P-PR values of words in the triples with measures TF-IDF, Okapi BM-25, and word frequency. We chose the best triplets and use them to generate summaries. Generated summaries are evaluated with ROUGE-N and ROUGE-S measures. Evaluation is performed on a corpus, built from Wikipedia, and also with manually created summaries. The results show that humans create significantly better summaries. The best computer generated summaries are created when graph connections are weighted with the number of bigram occurrences and P-PR values are weighted with the frequency of word occurrence in triplets.

    Item Type: Thesis (EngD thesis)
    Keywords: natural language processing, document summarization, personalized PageRank algorithm, ROUGE measure, weighted links, automatic document summarization
    Number of Pages: 68
    Language of Content: Slovenian
    Mentor / Comentors:
    Name and SurnameIDFunction
    izr. prof. dr. Marko Robnik Šikonja276Comentor
    Link to COBISS: http://www.cobiss.si/scripts/cobiss?command=search&base=51012&select=(ID=1537214659)
    Institution: University of Ljubljana
    Department: Faculty of Computer and Information Science
    Item ID: 3569
    Date Deposited: 13 Sep 2016 13:54
    Last Modified: 18 Oct 2016 10:55
    URI: http://eprints.fri.uni-lj.si/id/eprint/3569

    Actions (login required)

    View Item