Uroš Podobnikar (2016) Data quality management and data cleaning. MSc thesis.
Abstract
Today´s enterprises are often challenged by managing a large amount of data used in their business operation. Assurance and maintenance of adequate data quality level are important aspects of data quality management due to many reasons. On the one hand, the adequate data quality level represents a competitive advantage, and on the other hand, low data quality level leads to many unpleasant consequences. In the past, frameworks, methodologies, and tools to help ensuring adequate level of data quality were formed. Besides, the question of data quality is discussed in legislation and various standards. Despite that fact, some researches show poor state of data quality in enterprises. A purpose of the thesis is to research and present the area of data quality, and to show subsequent issues of low data quality. The thesis presents consequences as well as reasons of low data quality. It also shows reasons of data quality importance. In addition, it presents standards, legislation, and best practices that deal with the field of data quality. Data quality issues also arise in the field of the Internet of Things, which is an object of many researches lately, therefore, the thesis also presents main issues from that point of view. The main emphasis of the thesis is on the part of the field dealing with data quality and data cleaning. The thesis presents error types, various data cleaning frameworks, and combines their main activities in a consolidated view. Furthermore, the thesis presents an overview of the existing software solutions available on the market to support data cleaning tasks. The aforementioned is introduced in the theoretical part of the thesis. The second part of the thesis represents a practical part, where a proposal for data quality improvement is given using a prototype of a software solution to address a specific part of data quality management, which deals with data accuracy maintenance by sensing errors in data, and the possibility of error elimination (data cleaning). In addition, the thesis proposes installation of the solution in a concrete organisation´s information system by considering principles and rules the literature suggests. In the conclusion, there are essential approaches given to aid the improvement of data quality field in enterprises.
Actions (login required)