Mitja Pugelj (2010) Prediction of structured values using k nearest neighbours. EngD thesis.
Abstract
In this work we are interested in prediction accuracy of nearest neighbour method for predicting structured values; multiclass classification and regression, hierarchical multilabel classification and short time series. Problems and techniques for dealing with this kind of data are presented. Prediction accuracy is tested on various datasets from various (but mostly enviromental) domains. For some cases we also check influence of different vote (distance) weighting schemes and feature weighting using Random Forest method. Method's accuracy is compared to predictive clustering rules and trees. We show that nearest neighbour method is capable of predicting structured data with accuracy comparable to trees and rules. Furtherwore, method is in some cases significantly better than rules. In our work we have only tested prediction with at most fifteen voting neighbours. We expect that method could perform even better when more neighbours are used. We implement three nearest neighbour search methods: simple search, kd tree and vp tree. All methods are compared regarding to time spent for searching in space (where instances are distributed according to random uniform distribution). We conclude that vp tree is faster than kd for high dimension spaces, but has also limitations (dimensionality cource) and simple search outperforms it for high-dimension spaces. For our implementations, we derive a simple (experiment-based) rule that decides which method to use according to number of instances, dimensions and voting neighbours. Implementation is done in Java as part of Clus - system for predictive clustering.
Actions (login required)