ePrints.FRI - University of Ljubljana, Faculty of Computer and Information Science

Transformation of semi-structured data from online news site to RDF

Jaka Logar (2013) Transformation of semi-structured data from online news site to RDF. EngD thesis.

Download (2806Kb)


    Nowadays everyone is using web pages to gather some information. But these web pages are written in natural language and understandable only to human users. We would like to achieve that computers would also understand all of the information provided. Examples: statistical processing of data, data mining. We have developed an application for scraping data from RTV Slovenija web page and saving it in a structured form. Data saved that way are also understandable to computers. Our main goal was to save posts, comments on these posts and user profiles. We are saving the data in RDF triples because it is the standard form for saving those kinds of data. We have used the SPARQL query language for saving and querying data. The main advantage of saving data in RDF triples is the possibility of connecting our data store with other data stores which also have data saved in RDF triples, for instance DBpedia. Saving data in structured form allowed us to make some visualization examples. We could not make visualization if data weren’t saved in structured form.

    Item Type: Thesis (EngD thesis)
    Keywords: RDF, SPARQL, structured form, scraping data, RTV Slovenija
    Number of Pages: 74
    Language of Content: Slovenian
    Mentor / Comentors:
    Name and SurnameIDFunction
    doc. dr. Dejan Lavbič302Mentor
    Link to COBISS: http://www.cobiss.si/scripts/cobiss?command=search&base=50070&select=(ID=10355540)
    Institution: University of Ljubljana
    Department: Faculty of Computer and Information Science
    Item ID: 2306
    Date Deposited: 16 Dec 2013 16:05
    Last Modified: 10 Jan 2014 10:38
    URI: http://eprints.fri.uni-lj.si/id/eprint/2306

    Actions (login required)

    View Item