Behaviour of FreeViz algorithm in a high-dimensional space

Matija Polajnar (2009) Behaviour of FreeViz algorithm in a high-dimensional space. EngD thesis.

Abstract

FreeViz is a data mining method for local optimization of linear projections. In this thesis we cover method's performance in field of genetics. Genetic datasets usually contain much more attributes than instances; we first use linear algebra to predict method's problems on such data. Then we try to find properties of datasets that influence the performance of FreeViz. The goal of the analysed method is to find good (informative) visualizations, so we estimate its performance by measuring quality of a k(k Nearest Neighbours) classifier on the projections it yields. The results confirm FreeViz's poor performance on genetic data, but it nevertheless proved successful on one of the used dataset. In pursue of dataset properties that influence method's performance, we generated synthetic datasets. Results show that the ratio between attribute count and instance count has negligible influence. On the other hand, FreeViz's quality is degraded when most of the attributes are redundant and improved when there are mutually correlated attributes. We have also observed the paths that attribute projections make during optimization, but found no rule to distinguish redundant attributes from the rest. In case there is a large number of instances, FreeViz yields a projection that maps redundant attributes closer to the origin. That is not the case when there are more attributes than instances. However, in that case not even a nomogram for a naive Bayesian classifier can distinguish between informative and redundant attributes.

Item Type:

Thesis (EngD thesis)

Keywords:

visualization, linear projection, FreeViz, genetics, redundancy, correlation, attribute importance

Number of Pages:

Language of Content:

Slovenian

Mentor / Comentors:

Name and Surname	ID	Function
doc. dr. Janez Demšar	257	Mentor

Link to COBISS:

http://www.cobiss.si/scripts/cobiss?command=search&base=50070&select=(ID=7296340)

Institution:

University of Ljubljana

Department:

Faculty of Computer and Information Science

Item ID:

905

Date Deposited:

09 Sep 2009 15:54

Last Modified:

13 Aug 2011 00:35

URI:

http://eprints.fri.uni-lj.si/id/eprint/905

Actions (login required)

View Item