An efficient explanation of regression and classification models’ predictions

Erik Štrumbelj (2011) An efficient explanation of regression and classification models’ predictions. PhD thesis.

Preview

Abstract

Providing an explanation for prediction models is an important part of knowledge discovery. It helps with a quicker and better understanding of the model and increases the user's level of trust in the model's predictions. We focus on methods, which assign to each input feature its contribution to the model's prediction. Most such methods are model-specific. However, some can be used for any type of model, thus simplifying their use and enabling the comparison of different types of models. Existing general methods do not take into account the interactions across all subsets of input features and in certain cases fail to provide a proper explanation. We propose a general method, which takes all interactions into account and deals with the shortcomings of existing methods. We prove that the proposed method is related to the Shapley value - a well-known concept from game theory. We deal with the resulting exponential time complexity by using an approximation. We use selective sampling and quasi-random sampling to further improve the efficiency of the approximation algorithm. We also propose a mechanism, which allows the user to select a tradeoff between the total running time and expected error. Synthetic and real-world data sets are used and several different classification and regression models are applied to empirically show the practical utility of the proposed method. We describe how the method was applied to a real-world breast cancer recurrence problem and how oncologists confirmed the method's usefulness. We also list other successful application of the proposed method. We also conducted an experiment, during which we tested the users' ability to learn from examples (with or without an explanation) and make predictions for new and unknown instances. The results reveal that the explanation with contributions of input features help and increase the accuracy of the user's predictions.

Item Type:

Thesis (PhD thesis)

Keywords:

knowledge discovery in data, interpretation, visualization, feature importance

Number of Pages:

118

Language of Content:

Slovenian

Mentor / Comentors:

Name and Surname	ID	Function
prof. dr. Igor Kononenko	237	Mentor

Link to COBISS:

http://www.cobiss.si/scripts/cobiss?command=search&base=50070&select=(ID=00008729684)

Institution:

University of Ljubljana

Department:

Faculty of Computer and Information Science

Item ID:

1566

Date Deposited:

15 Oct 2011 01:27

Last Modified:

02 Nov 2011 16:57

URI:

http://eprints.fri.uni-lj.si/id/eprint/1566

Actions (login required)

View Item