Comparison of the methods for pattern clustering

Nejc Ilc (2009) Comparison of the methods for pattern clustering. EngD thesis.

Abstract

Clustering or cluster analysis is a fundamental machine learning task, which is, unfortunatelly, an ill-posed problem, caused by large diversity of problem domains. Many different approaches have been used to solve it, which consequently reflects as a long list of clustering methods. Moreover, it is hard to determine, which clustering of particular data is better than another, because there does not exist an universal similarity metric, which would be the most appropriate for all different problems. In the thesis, four chosen methods for clustering are being examined, each of which has its interesting features. These are: KMC, ECMC, EM GMM in CSC. In addition, new criteria for the evaluation of clustering correctness appear, which are inherently subject to a peer comparison. My intention was to carry out a comprehensive analysis of the chosen methods and objectively evaluate the results of the clustering of individual typical problem domain. To achieve this, four internal and six external evaluation criteria or indices were used. On their basis final evaluation of the effectiveness of various methods is given. Several synthetic and real data sets on which the clustering has been performed out have been selected to reflect the typical problems in this field. The final results of the comparison shows that the application of knowledge of information theory, which exploits novel CSC method, contribute to a better outcome depending on the selected criteria and the data sets. It also opens up considerable potential to continue its improvement and is also the motivation for using alternative approaches to solve the clustering problem.

Item Type:

Thesis (EngD thesis)

Keywords:

Clustering, method comparison, internal indices, external validation

Number of Pages:

Language of Content:

Slovenian

Mentor / Comentors:

Name and Surname	ID	Function
prof. dr. Andrej Dobnikar	234	Mentor

Link to COBISS:

http://www.cobiss.si/scripts/cobiss?command=search&base=50070&select=(ID=6896468)

Institution:

University of Ljubljana

Department:

Faculty of Computer and Information Science

Item ID:

795

Date Deposited:

07 Jan 2009 10:23

Last Modified:

13 Aug 2011 00:34

URI:

http://eprints.fri.uni-lj.si/id/eprint/795

Actions (login required)

View Item