Tomaž Kuralt (2009) Autonomous system for data networks integration. EngD thesis.
Abstract
In practice, we often face the problem of combining two or more data sets into a single one. Data is often presented in the form of networks, because they allow us an easy formulation of relations between entities. Networks also prove suitable for implementation of various analysis. In such cases, the usual problem of data integration obtains new form of combining data, which is called network integration. In this work we present autonomous system for integration of any number of networks into a single network. Therefore, the task of the system is to identify possible redundancy within the set of given networks and to present every real entity as a single element in final network. We use a collective entity resolution approach which means, that every individual decision depends on previously made decisions in the system. For this purpose we use different attribute and relational metrics. Our entity resolution model is also self-adaptive according to the given data set. Consequently, participation of a domain expert is not needed at defining various model parameters, as this is common in similar systems. The system was tested on two real world-data and obtained results are very good. We have also performed various experiments over synthetic data, where we have observed the quality of integration, regarding to different characteristics of input data sets. We are satisfied with the results of the system, but it would be necessary to perform additional tests over various other data domains, so we could estimate the efficiency of the system in general.
Actions (login required)