Scalable Architecture for data aggregation from various web sources

Alan Rijavec (2012) Scalable Architecture for data aggregation from various web sources. EngD thesis.

Preview

Abstract

Software programs for collecting data from online sources are have been used for a number of years. The first search engines appeared when the Internet was still in its infancy. The application, developed in the context of this thesis, is akin to web search engines, which are in essence web spiders. We can also find similarities with aggregators, which usually collect the same type of data from different sources. The aim of this work is to develop a programme to aggregate data from predefined sources. Typical aggregators collect only the information of the same type. News aggregators such as Google News, collect news which always has the same data structure. Our programme obtains data through objects, which are used in a similar way as plug-ins. "Plug-ins" are easy to implement and corrections need little changes. The specialty of our application is semantic capturing of heterogeneous data defined by each 'plug-in' and not by the application. This is an application that the user may benefit from in any desired field provided we develop the necessary plug-ins.

Item Type:

Thesis (EngD thesis)

Keywords:

aggregator, web crawler, semantic data capture

Number of Pages:

Language of Content:

Slovenian

Mentor / Comentors:

Name and Surname	ID	Function
izr. prof. dr. Marko Bajec	245	Mentor

Link to COBISS:

http://www.cobiss.si/scripts/cobiss?command=search&base=50070&select=(ID=00009493588)

Institution:

University of Ljubljana

Department:

Faculty of Computer and Information Science

Item ID:

1894

Date Deposited:

16 Oct 2012 14:55

Last Modified:

08 Nov 2012 12:00

URI:

http://eprints.fri.uni-lj.si/id/eprint/1894

Actions (login required)

View Item