Semi- automatic web site wrapper construction

Rok Burgar (2012) Semi- automatic web site wrapper construction. EngD thesis.

Preview

Abstract

The paper describes the development of program for scraping data from partially structured web pages. Web is a document based system. Every documents have their metadata that describes their structure. Documents on the web are written in HTML. Problem with HTML is that it's primary purpose is to describe visual properties of the document and not its content. Besides that, web documents are made with different HTML standards and almost all web pages are not 100% compatible with web standards. The paper describes how we can simply and effectivelly get the needed data from a web page. Implementation describes two main programs. The first one is a plugin for a web browser and takes care of marking the data locations. The second program runs on the web server and it gets the data from a web page based on data we marked with the first program.

Item Type:

Thesis (EngD thesis)

Keywords:

scraping template, web scraping, semantic web, structure, browser plugin, web server

Number of Pages:

Language of Content:

Slovenian

Mentor / Comentors:

Name and Surname	ID	Function
doc. dr. Dejan Lavbič		Mentor

Link to COBISS:

http://www.cobiss.si/scripts/cobiss?command=search&base=50070&select=(ID=00009454420)

Institution:

University of Ljubljana

Department:

Faculty of Computer and Information Science

Item ID:

1830

Date Deposited:

22 Sep 2012 11:52

Last Modified:

19 Oct 2012 12:41

URI:

http://eprints.fri.uni-lj.si/id/eprint/1830

Actions (login required)

View Item