Machine learning techniques for UCSD Data Mining Contest

Anže Starič (2010) Machine learning techniques for UCSD Data Mining Contest. EngD thesis.

Abstract

With participation in machine learning competitions we get acquainted with new problem domains and new types of problems. We are forced to look for and try out new techniques and search for innovative problem solving approaches. In UCSD Data Mining Contest, our task was to rank the ordering consumer pool according to who is most likely to become a customer of the retailer. In the following dissertation we have developed a technique for predicting the probability of a consumer becoming a customer of the retailer. Standard machine learning algorithms were evaluated and attribute analysis has been performed on the train dataset. In order to improve the score of standard algorithms review of methods that augment Naive Bayes for ranking has also been carried out and the most promising one has been implemented by using the Orange framework. We have also assessed the impact of data discretization on the Naive Bayes and evaluated ensemble techniques that combine the Naive Bayes Classifiers. Results show that ranking of potential customers is indeed a hard task for standard machine learning algorithms. Augmented Naive Bayes performed slightly better in terms of AUC, but the best results were produced using a combination of data discretization and standard Naive Bayes Classifier. AUC scores achieved were relatively low compared to scores achieved on other machine learning problems. This suggests that more attributes should be introduced into dataset before using this method in production environment.

Item Type:

Thesis (EngD thesis)

Keywords:

machine learning, naive Bayes, ranking, ensemble techniques

Number of Pages:

Language of Content:

Slovenian

Mentor / Comentors:

Name and Surname	ID	Function
prof. dr. Blaž Zupan	106	Mentor

Link to COBISS:

http://www.cobiss.si/scripts/cobiss?command=search&base=50070&select=(ID=00007963220)

Institution:

University of Ljubljana

Department:

Faculty of Computer and Information Science

Item ID:

1159

Date Deposited:

08 Sep 2010 18:27

Last Modified:

13 Aug 2011 00:37

URI:

http://eprints.fri.uni-lj.si/id/eprint/1159

Actions (login required)

View Item