ePrints.FRI - University of Ljubljana, Faculty of Computer and Information Science

Databases for Data Mining

Lado Langof (2015) Databases for Data Mining. MSc thesis.

[img]
Preview
PDF
Download (2117Kb)

    Abstract

    This work is about looking for synergies between data mining tools and databa\-se management systems (DBMS). Imagine a situation where we need to solve an analytical problem using data that are too large to be processed solely inside the main physical memory and at the same time too small to put data warehouse or distributed analytical system in place. The target area is therefore a single personal computer that is used to solve data mining problems. We are looking for tools that allows us to effectively process and prepare such quantity of data for further analysis. The main focus of this work is not on data mining itself but in particular on the second and third step of CRISP-DM process standard for data mining, that is data understanding and data preparation step. The question is how to use functionalities of various DBMS and ETL tools to prepare data as effectively as possible to use it in data mining. Unneeded data should be ignored and the remainder should be transformed into an appropriate form. Data mining execution time and accuracy should be improved when using optimized data that do not contain unneeded attributes, duplicate records, typos and other unwanted properties. The objective of this work is thus to find appropriate practical methods (tools or combinations of tools, methodologies) for collecting relatively large amounts of data from different sources and in different forms, joining them and transforming this data to a format that can be used directly in data mining algorithms by using DMBS and ETL tools.

    Item Type: Thesis (MSc thesis)
    Keywords: ETL, ELT, CRISP-DM, DBMS, data preparation, transformations, data mining
    Number of Pages: 125
    Language of Content: Slovenian
    Mentor / Comentors:
    Name and SurnameIDFunction
    doc. dr. Matjaž Kukar267Mentor
    Link to COBISS: http://www.cobiss.si/scripts/cobiss?command=search&base=51012&select=(ID=1536378563)
    Institution: University of Ljubljana
    Department: Faculty of Computer and Information Science
    Item ID: 2993
    Date Deposited: 11 Jun 2015 13:09
    Last Modified: 28 Jul 2015 10:14
    URI: http://eprints.fri.uni-lj.si/id/eprint/2993

    Actions (login required)

    View Item