Prediction of structured values using k nearest neighbours

Mitja Pugelj (2010) Prediction of structured values using k nearest neighbours. EngD thesis.

Abstract

In this work we are interested in prediction accuracy of nearest neighbour method for predicting structured values; multiclass classification and regression, hierarchical multilabel classification and short time series. Problems and techniques for dealing with this kind of data are presented. Prediction accuracy is tested on various datasets from various (but mostly enviromental) domains. For some cases we also check influence of different vote (distance) weighting schemes and feature weighting using Random Forest method. Method's accuracy is compared to predictive clustering rules and trees. We show that nearest neighbour method is capable of predicting structured data with accuracy comparable to trees and rules. Furtherwore, method is in some cases significantly better than rules. In our work we have only tested prediction with at most fifteen voting neighbours. We expect that method could perform even better when more neighbours are used. We implement three nearest neighbour search methods: simple search, kd tree and vp tree. All methods are compared regarding to time spent for searching in space (where instances are distributed according to random uniform distribution). We conclude that vp tree is faster than kd for high dimension spaces, but has also limitations (dimensionality cource) and simple search outperforms it for high-dimension spaces. For our implementations, we derive a simple (experiment-based) rule that decides which method to use according to number of instances, dimensions and voting neighbours. Implementation is done in Java as part of Clus - system for predictive clustering.

Item Type:

Thesis (EngD thesis)

Keywords:

data mining, structured data prediction, nearest neighbour method, vp tree, Clus.

Number of Pages:

Language of Content:

Slovenian

Mentor / Comentors:

Name and Surname	ID	Function
doc. dr. Janez Demšar	257	Mentor
prof. dr. Sašo Džeroski		Comentor

Link to COBISS:

http://www.cobiss.si/scripts/cobiss?command=search&base=50070&select=(ID=00007951188)

Institution:

University of Ljubljana

Department:

Faculty of Computer and Information Science

Item ID:

1175

Date Deposited:

23 Sep 2010 16:11

Last Modified:

13 Aug 2011 00:37

URI:

http://eprints.fri.uni-lj.si/id/eprint/1175

Actions (login required)

View Item