ePrints.FRI - University of Ljubljana, Faculty of Computer and Information Science

Scalable Architecture for data aggregation from various web sources

Alan Rijavec (2012) Scalable Architecture for data aggregation from various web sources. EngD thesis.

[img]
Preview
PDF
Download (565Kb)

    Abstract

    Software programs for collecting data from online sources are have been used for a number of years. The first search engines appeared when the Internet was still in its infancy. The application, developed in the context of this thesis, is akin to web search engines, which are in essence web spiders. We can also find similarities with aggregators, which usually collect the same type of data from different sources. The aim of this work is to develop a programme to aggregate data from predefined sources. Typical aggregators collect only the information of the same type. News aggregators such as Google News, collect news which always has the same data structure. Our programme obtains data through objects, which are used in a similar way as plug-ins. "Plug-ins" are easy to implement and corrections need little changes. The specialty of our application is semantic capturing of heterogeneous data defined by each 'plug-in' and not by the application. This is an application that the user may benefit from in any desired field provided we develop the necessary plug-ins.

    Item Type: Thesis (EngD thesis)
    Keywords: aggregator, web crawler, semantic data capture
    Number of Pages: 52
    Language of Content: Slovenian
    Mentor / Comentors:
    Name and SurnameIDFunction
    izr. prof. dr. Marko Bajec245Mentor
    Link to COBISS: http://www.cobiss.si/scripts/cobiss?command=search&base=50070&select=(ID=00009493588)
    Institution: University of Ljubljana
    Department: Faculty of Computer and Information Science
    Item ID: 1894
    Date Deposited: 16 Oct 2012 14:55
    Last Modified: 08 Nov 2012 12:00
    URI: http://eprints.fri.uni-lj.si/id/eprint/1894

    Actions (login required)

    View Item