Matija Polajnar (2009) Behaviour of FreeViz algorithm in a high-dimensional space. EngD thesis.
Abstract
FreeViz is a data mining method for local optimization of linear projections. In this thesis we cover method's performance in field of genetics. Genetic datasets usually contain much more attributes than instances; we first use linear algebra to predict method's problems on such data. Then we try to find properties of datasets that influence the performance of FreeViz. The goal of the analysed method is to find good (informative) visualizations, so we estimate its performance by measuring quality of a k(k Nearest Neighbours) classifier on the projections it yields. The results confirm FreeViz's poor performance on genetic data, but it nevertheless proved successful on one of the used dataset. In pursue of dataset properties that influence method's performance, we generated synthetic datasets. Results show that the ratio between attribute count and instance count has negligible influence. On the other hand, FreeViz's quality is degraded when most of the attributes are redundant and improved when there are mutually correlated attributes. We have also observed the paths that attribute projections make during optimization, but found no rule to distinguish redundant attributes from the rest. In case there is a large number of instances, FreeViz yields a projection that maps redundant attributes closer to the origin. That is not the case when there are more attributes than instances. However, in that case not even a nomogram for a naive Bayesian classifier can distinguish between informative and redundant attributes.
Actions (login required)