Rok Močnik (2011) Reliability estimation for prediction of effects of small molecules. EngD thesis.
Abstract
Today in all the area of human activities, we gather enormous amounts of data, more than ever before. This data hides important information and knowledge. For human capabilities this amount of data is overwhelming, so we try to develop computer systems to help us with this. Much of essential research in this field is done within machine learning, a subfield of artificial intelligence. Machine learning often deals with with predicting classes of attribute-value defined examples. Success of predictions is usualy estimated over whole test set. In this thesis we are interested whether we can - for a specific example estimate if the prediction for this example is accurate or not. To solve this problem, we implemented a set of already known methods for reliability estimation of specific examples. The methods were developed within Orange, a Python-based data mining suite. Method were tested on data about quantitative structure-activity relationships. Among all the methods tested, the best-performing was the approach that selects the technique for reliability estimation based on internal cross-validation. But this method has a flaw. On bigger datasets this approach is computationally very demanding. In search of effective solution we proposed new method for reliability estimation, that only works in association with random forest. The method uses the variance inside specific prediction in random forest for reliability estimation. This approach is quick, because it does not add anything to random forest, except for calculating variance. It also shows good results on bigger datasets.
Actions (login required)