Darko Pevec (2013) Reliability estimation of individual predictions in supervised learning. PhD thesis.
Abstract
The thesis discuses reliability estimation of individual predictions in the supervised learning framework. This research field is rather new and currently attracts little attention from experts and users alike. The main indicators of success of machine learning models are consequently still measures of average performance, for example classification accuracy or root mean square error. However, averaged statistics are not able to provide a full view of models' performance and an increasing number of today's users of machine learning have interest for additional information that can help them to better understand the models' results. This additional information is even more important in cases where wrong predictions may lead to serious financial losses or medical complications. It happens that experts become reluctant to use prediction systems if their predictions are not backed up by their reliability assessments. The prevalent methods for model assessment are insufficient among fields where decision support is of crucial importance or where the average performance is not of paramount importance, information on the reliability of single predictions may prove very beneficial. The greatest concern with use of machine learning algorithms is whether the chosen model represents the data well and if the predicted values conform to the dataset or has the model learned a wrong concept or even over-fitted to noise in the data. As we want to take into account all possible machine learning models, we have to deal with them as with black-boxes, which means we only have access to their input (the training examples) and their output (their predictions). This work presents a complete overview of reliability estimators for supervised learning. This framework consists of classification and regression, due to their inherent differences. We also distinguish between point-wise and interval estimators, but interestingly, point-wise estimators can be applied both to classification and regression, whereas interval estimators are defined only for regression. The first contribution of this thesis is a new comparative study of the usefulness of point-wise estimators in the classification setting. The analysis and comparison with a reference function shows that this kind of reliability estimation is rarely useful on real-world datasets. But in cases when we have to deal with a suboptimal model and the point-wise estimators conform with the data, they can prove to improve the results and provide additional information. Regarding interval estimation, the thesis contributes a novel, unifying view of reliability estimation enabling their comparison, which was not possible before. Our analysis shows the dual nature of the two families of approaches: methods based on bootstrap and maximum likelihood estimation provide valid prediction intervals and methods based on local neighborhoods provide optimal prediction intervals. Based on this finding, we present a combined approach that merges the properties of the two groups. Results of this method are favorable, indicating that the combined prediction intervals are more robust. Existing statistics that provide information merely on the models' average accuracy are not truly informative, while on the other hand, appropriate graphic visualizations are known to be very useful for developing users' intuition and understanding of the models behavior. After demonstrating an existing visualization tool for comparing prediction intervals we present a new visualization technique that enables model comparison and has potential for knowledge discovery. The final contribution is a model aggregation procedure based on a combined statistic for robust selection and merging of regression predictions. This new evaluation statistic and aggregation procedure provides confirmatory and consequently more reliable predictions.
Actions (login required)