Jaka Logar (2013) Transformation of semi-structured data from online news site to RDF. EngD thesis.
Nowadays everyone is using web pages to gather some information. But these web pages are written in natural language and understandable only to human users. We would like to achieve that computers would also understand all of the information provided. Examples: statistical processing of data, data mining. We have developed an application for scraping data from RTV Slovenija web page and saving it in a structured form. Data saved that way are also understandable to computers. Our main goal was to save posts, comments on these posts and user profiles. We are saving the data in RDF triples because it is the standard form for saving those kinds of data. We have used the SPARQL query language for saving and querying data. The main advantage of saving data in RDF triples is the possibility of connecting our data store with other data stores which also have data saved in RDF triples, for instance DBpedia. Saving data in structured form allowed us to make some visualization examples. We could not make visualization if data weren’t saved in structured form.
Actions (login required)