ePrints.FRI - University of Ljubljana, Faculty of Computer and Information Science

Attribute evaluation on imbalanced data sets

Domen Rački (2011) Attribute evaluation on imbalanced data sets. EngD thesis.

[img]
Preview
PDF
Download (1420Kb)

    Abstract

    We analyze the performance of attribute evaluation measures on imbalanced datasets at different levels of imbalance. We sample real world datasets at ratios 1:5, 1:10, 1:50, 1:100, 1:500 and 1:1000. We build decision tree models and for each attribute evaluation measure compute AUC with stratified 5x2 cross validation. To test significance of the difference we use Friedman's test. With Nemenyi's test we determine and graphically display the similarities and differences. We find that the best performing measure at unaltered class ratios is MDL, for class ratios 1:5 the best measure is the angular distance. For ratios 1:10 and 1:50 the beast measure is ReliefF and for class ratios 1:100, 1:500 and 1:1000 the best performing measure is information gain. The worst performing measure on all class ratios is accuracy.

    Item Type: Thesis (EngD thesis)
    Keywords: machine learning, imbalanced datasets, attribute evaluation, CORElearn, decision trees
    Number of Pages: 66
    Language of Content: Slovenian
    Mentor / Comentors:
    Name and SurnameIDFunction
    prof. dr. Marko Robnik Šikonja276Mentor
    Link to COBISS: http://www.cobiss.si/scripts/cobiss?command=search&base=50070&select=(ID=00008631892)
    Institution: University of Ljubljana
    Department: Faculty of Computer and Information Science
    Item ID: 1510
    Date Deposited: 16 Sep 2011 13:02
    Last Modified: 26 Sep 2011 18:42
    URI: http://eprints.fri.uni-lj.si/id/eprint/1510

    Actions (login required)

    View Item