Gregor Leban (2007) Data visualization using machine learning. PhD thesis.
Abstract
Data visualization is a tool that has an enormous potential for extracting knowledge from data. Visualizing the right set of features in a right way can clearly identify interesting and potentially useful patterns. However, not all data projections are equally interesting and the task of a data miner is to find the most insightful ones. To help the user we developed a method called VizRank, which can automatically compute an estimate of interestingness for each of possible projections of class labeled data. We can rank projections according to this score and then focus only on a small subset of best ranked projections, that will provide the greatest insight into the data. VizRank can be applied on any visualization method that maps attribute values to the position of a shown symbol. Examples of such methods are scatterplot, radviz, polyviz and general linear projections. We also extended the concept of projection ranking to parallel coordinates method and to mosaic diagrams. To demonstrate the usefulness of the developed algorithms we present results on data sets from UCI repository and from cancer microarray data analysis.
Actions (login required)