Grega Kešpret (2012) Estimation of Prediction Reliabilities in Regression Modelling of Data Streams. EngD thesis.
Abstract
In traditional problems of machine learning, usually the key restriction is the size of sample and not so much computational power. Nowadays, data stream sources continuously generate huge amounts of data from non-stationary distributions, so modelling the data in traditional ways is becoming obsolete. There are certain restrictions like finite size of memory with potentially unlimited amount of data, possible low computational power of nodes, sudden changes in generation process, ability to handle data in real-time and others, which require new, incremental approaches. In this thesis we develop and evaluate prediction system of electricity consumption based on data streams ideas. First we developed method for detecting and correcting anomalies in the data, and then implemented and evaluated 8 different prediction models based on their prediction accuracy on real data. Besides that, we also researched prediction reliability estimates and correction of prediction based on those measures. In the end, we present experimental results that were obtained using real data of 11 data streams of different areas of New York state in the USA. We also discuss the feasibility of using reliability estimates CNK and SAbias to correct initial predictions.
Actions (login required)