Alan Rijavec (2012) Scalable Architecture for data aggregation from various web sources. EngD thesis.
Software programs for collecting data from online sources are have been used for a number of years. The first search engines appeared when the Internet was still in its infancy. The application, developed in the context of this thesis, is akin to web search engines, which are in essence web spiders. We can also find similarities with aggregators, which usually collect the same type of data from different sources. The aim of this work is to develop a programme to aggregate data from predefined sources. Typical aggregators collect only the information of the same type. News aggregators such as Google News, collect news which always has the same data structure. Our programme obtains data through objects, which are used in a similar way as plug-ins. "Plug-ins" are easy to implement and corrections need little changes. The specialty of our application is semantic capturing of heterogeneous data defined by each 'plug-in' and not by the application. This is an application that the user may benefit from in any desired field provided we develop the necessary plug-ins.
Actions (login required)