Grega Boštjančič (2010) Data mining with Oracle Data Mining (11g). EngD thesis.
Abstract
Data mining is becoming more and more important on number of fields in our lives. It is used by doctors defining medical conditions, bankers dealing with risk management and so on. It is a process of discovering information, patterns and correlation with searching through bigger or smaller quantity of data. Every data mining process, if one wants it to be successful, must be composed from exact steps or tasks. To reach our goal, we often have to take a step back to improve the former state. This process is standardizated with standard CRISP-DM (CRoss-Industry Standard Process for Data Mining). Data mining uses algorithms for pattern discovering as well as statistical and mathematical techniques. There are many applications which use sufisticated mix of classic and advanced algorihtms like decision tree, Naive Bayes, Support Vector Machines etc. In this paper I compared methods and scalability in two applications for data mining. The first is included in Oracle database, Oracle Data Miner, whereas the second, Weka, is independent, opensource java application. It turns out that Oracle is better with working with larger datasets while Weka is more accurate with smaller datasets. Weka's problem is mainly her wastefulness with system sources which makes her incompetent at working with large datasets. Orcale, however, sacrifaces accuracy in order to be capable of always building a model.
Actions (login required)