ePrints.FRI - University of Ljubljana, Faculty of Computer and Information Science

Automatic text summarization using semantic analysis

Dušan Božič (2016) Automatic text summarization using semantic analysis. MSc thesis.

Download (4Mb)


    In this thesis, we used a method of latent semantic analysis (LSA) for automatic multi-document summarization. LSA algorithm analyzes the relationships between words and document by producing a set of concepts that describe this relationship. In the preprocessing stage, all words were lemmatized based on Slovenian lexicon. Our work reiterated Slovenian academic contributions to science acquired from the Slovenian digital lexicons. The results of the LSA analysis are paragraphs ranked by relevance. The most promising paragraphs are candidates for the summary. For the proper mapping of the lemmatized paragraphs into the original in the phase of preprocessing we performed syntactical analysis of the source text. The resulting extract was changed into the abstract summary, using semantic analysis of sentences and lexical chaining. For this purpose we used Slovenian morphological lexicon. The quality of the obtained summaries was evaluated using the Rouge algorithm. We compared those summaries with abstracts from the analysis of archetypes and human summaries. To implement the summarization, we implemented a stand-alone web application named SimpleX, which was implemented in a server environment to support the database. Experimental results show that the proposed semantic approach helps to build a way towards the large collections of documents.

    Item Type: Thesis (MSc thesis)
    Keywords: natural text summarization, semantic analysis, lexical chaining, analysis of archetypes, extraction-based summarization, semantic-based summarization, summaries, abstracts, SimpleX.
    Number of Pages: 73
    Language of Content: Slovenian
    Mentor / Comentors:
    Name and SurnameIDFunction
    prof. dr. Igor Kononenko237Mentor
    Link to COBISS: http://www.cobiss.si/scripts/cobiss?command=search&base=51012&select=(ID=1537280195)
    Institution: University of Ljubljana
    Department: Faculty of Computer and Information Science
    Item ID: 3477
    Date Deposited: 31 Aug 2016 12:39
    Last Modified: 15 Nov 2016 11:18
    URI: http://eprints.fri.uni-lj.si/id/eprint/3477

    Actions (login required)

    View Item