ePrints.FRI - University of Ljubljana, Faculty of Computer and Information Science

A presentation of web news from multiple sources

Matjaž Vončina (2012) A presentation of web news from multiple sources. EngD thesis.

[img]
Preview
PDF
Download (1717Kb)

    Abstract

    We built a website, where visitors can find and read current news from Slovenia from multiple sources. We presented news articles in groups of similar news to shorten the time to find important news and to spare visitors browsing of several websites. To achieve this we built a database of news and news processor. We developed a system to read and parse news from multiple sources, news normalization with lemmatization, weighting of words in the news and presenting the news using a vector space model. We used our model to calculate similarity between news, which enabled us to clusters similar news. We built a prototype website to display relevant news clusters.

    Item Type: Thesis (EngD thesis)
    Keywords: text similarity, categorization, lemmatization, cosine coefficient, news
    Number of Pages: 32
    Language of Content: Slovenian
    Mentor / Comentors:
    Name and SurnameIDFunction
    prof. dr. Marko Robnik Šikonja276Mentor
    Link to COBISS: http://www.cobiss.si/scripts/cobiss?command=search&base=50070&select=(ID=00009058132)
    Institution: University of Ljubljana
    Department: Faculty of Computer and Information Science
    Item ID: 1618
    Date Deposited: 20 Feb 2012 19:54
    Last Modified: 05 Apr 2012 18:00
    URI: http://eprints.fri.uni-lj.si/id/eprint/1618

    Actions (login required)

    View Item