ePrints.FRI - University of Ljubljana, Faculty of Computer and Information Science

Machine learning techniques for UCSD Data Mining Contest

Anže Starič (2010) Machine learning techniques for UCSD Data Mining Contest. EngD thesis.

[img] PDF
Download (301Kb)

    Abstract

    With participation in machine learning competitions we get acquainted with new problem domains and new types of problems. We are forced to look for and try out new techniques and search for innovative problem solving approaches. In UCSD Data Mining Contest, our task was to rank the ordering consumer pool according to who is most likely to become a customer of the retailer. In the following dissertation we have developed a technique for predicting the probability of a consumer becoming a customer of the retailer. Standard machine learning algorithms were evaluated and attribute analysis has been performed on the train dataset. In order to improve the score of standard algorithms review of methods that augment Naive Bayes for ranking has also been carried out and the most promising one has been implemented by using the Orange framework. We have also assessed the impact of data discretization on the Naive Bayes and evaluated ensemble techniques that combine the Naive Bayes Classifiers. Results show that ranking of potential customers is indeed a hard task for standard machine learning algorithms. Augmented Naive Bayes performed slightly better in terms of AUC, but the best results were produced using a combination of data discretization and standard Naive Bayes Classifier. AUC scores achieved were relatively low compared to scores achieved on other machine learning problems. This suggests that more attributes should be introduced into dataset before using this method in production environment.

    Item Type: Thesis (EngD thesis)
    Keywords: machine learning, naive Bayes, ranking, ensemble techniques
    Number of Pages: 35
    Language of Content: Slovenian
    Mentor / Comentors:
    Name and SurnameIDFunction
    prof. dr. Blaž Zupan106Mentor
    Link to COBISS: http://www.cobiss.si/scripts/cobiss?command=search&base=50070&select=(ID=00007963220)
    Institution: University of Ljubljana
    Department: Faculty of Computer and Information Science
    Item ID: 1159
    Date Deposited: 08 Sep 2010 18:27
    Last Modified: 13 Aug 2011 00:37
    URI: http://eprints.fri.uni-lj.si/id/eprint/1159

    Actions (login required)

    View Item