ePrints.FRI - University of Ljubljana, Faculty of Computer and Information Science

Clustering-based discretization of numeric attributes

Matej Pičulin (2011) Clustering-based discretization of numeric attributes. EngD thesis.

[img] PDF
Download (1088Kb)

    Abstract

    We propose a new method for discretization, which uses clustering to determine candidate boundaries. We use two well-known clustering methods: k-means clustering and hierarchical clustering. Discretization is well-know and difficult problem in machine learning and data mining, especially for strongly dependent attributes. Most existing methods do not take dependencies into account, therefore we develop an algorithm, which will finds dependencies implicitly with the help of clustering. First we present some known discretization methods and classification algorithms, which we use in the presentation. We present the idea of clustering-based discretization and try to answer the following questions: which clustering method to use, how many clusters do we need, how do clusters vote for boundaries and how to choose final boundaries from candidates. We extensively test the approach on artificial domains with strong dependencies and on real domains. We test several variations of cluster-based discretization and show the methods can solve some cases with strongly dependent attributes. Finally, we suggest possible improvements and extensions of the work.

    Item Type: Thesis (EngD thesis)
    Keywords: discretization, clustering, machine learning, numeric attributes
    Number of Pages: 38
    Language of Content: Slovenian
    Mentor / Comentors:
    Name and SurnameIDFunction
    prof. dr. Marko Robnik Šikonja276Mentor
    Link to COBISS: http://www.cobiss.si/scripts/cobiss?command=search&base=50070&select=(ID=00008291924)
    Institution: University of Ljubljana
    Department: Faculty of Computer and Information Science
    Item ID: 1308
    Date Deposited: 21 Mar 2011 12:18
    Last Modified: 13 Aug 2011 00:38
    URI: http://eprints.fri.uni-lj.si/id/eprint/1308

    Actions (login required)

    View Item