John Adeyanju Alao (2011) Visualization of tree models and random forests. EngD thesis.
In the field of machine learning there are several knowledge representation techniques, for example decision and regression trees are very popular. CORElearn is a machine learning package that generates desision and regression trees, knn models, naive Bayesian model and random forests (that is a model comprised of a set of trees). Our aim is to visually present the CORElearn models. In the case of decision and regression trees we present them directly. This is not possible for random forests where the knowledge is dispersed among several trees. We need methods capable of considering every single tree in the forest. For example, we compute a proximity matrix from all the trees in a forest. We can also use a sum of classification accuracies per tree to represent the prediction of the whole forest. The tools for random forest comprehension presented in this work are: variable importance, proximity measure, outlier detection, multi-dimensional scaling, clustering and variables effect on class. Both direct and indirect visualization methods are comparable. The interpretation of trees is simpler because there is only one image to investigate. On the other hand random forest methods are more complete.
|Item Type: ||Thesis (EngD thesis)|
|Keywords: ||machine learning, decision tree, regression tree, random forest, vizualization, R programming language, rpart package, proximity matrix, CORElearn package|
|Number of Pages: ||39|
|Language of Content: ||Slovenian|
|Mentor / Comentors: |
|Name and Surname||ID||Function|
|prof. dr. Marko Robnik Šikonja||276||Mentor|
|Link to COBISS: ||http://www.cobiss.si/scripts/cobiss?command=search&base=50070&select=(ID=00008451924)|
|Institution: ||University of Ljubljana|
|Department: ||Faculty of Computer and Information Science|
|Item ID: ||1377|
|Date Deposited: ||09 Jun 2011 11:11|
|Last Modified: ||31 Aug 2011 14:42|
Actions (login required)