ePrints.FRI - University of Ljubljana, Faculty of Computer and Information Science

Martin Možina (2003) . Prešeren awards for students.

Full text not available from this repository.

Abstract

Naive Bayesian classifier is one of the simplest yet surprisingly powerful technique to construct predictive model from classified data. Despite its naivety - assumption of attribute independence given the class - empirical results show that it performs surprisingly well in many domains containing clear attribute dependencies. In this work we theoretically, practically and graphically compare naive Bayesian classifier to logistic regression, a standard predictive modelling method from statistics. On the contrary to naive Bayesian classifier logistic regression makes no assumptions when constructing predictive model. We show that naive Bayesian classifier can be presented in an alternative mathematical form (log odds), which is comparable to logistic regression. We prove that methods are mathematically equivalent when attributes are conditionally independent. This logically implies of this that the differences between the two methods are a result of dependencies among attributes in the data. For visual presentation of naive Bayesian classifier, we develop a normalized naive Bayesian nomogram that is based on logistic regression nomogram. Additionally, we improve naive Bayesian nomogram so that it can depict both negative and positive influences of the values of attributes, and in contrast to logistic regression nomograms does not align the "base" values to the zero point. Another advantage over a nomogram for logistic regression is ability to handle unknown attribute values. We compare the two methods through visualization and study of predictive accuracy. Overall, experiments show very similar results, where logistic regression performs slightly better when learning on large data sets, and naive Bayesian classifiers turns to be better at smaller data sets. We summarize that naive Bayesian classifier performs similarly to logistic regression in most cases. Logistic regressions seems to be preferred to naive Bayesian classifier when learning on large data sets, leaving aside computational issues and matters such as handling missing data. Naive Bayesian classifier proves out to be successful at less deterministic learning problems. Also, when model understanding and graphical presentation of the model is important, naive Bayesian classifier is the better method.

Item Type: Thesis (Prešeren awards for students)
Keywords:
Number of Pages: 40
Language of Content: Slovenian
Mentor / Comentors:
Name and SurnameIDFunction
prof. dr. Blaž Zupan106Mentor
Link to COBISS: http://www.cobiss.si/scripts/cobiss?command=search&base=51012&select=(ID=4033876)
Institution: University of Ljubljana
Department: Faculty of Computer and Information Science
Item ID: 3724
Date Deposited: 05 Jan 2017 17:10
Last Modified: 13 Feb 2017 09:30
URI: http://eprints.fri.uni-lj.si/id/eprint/3724

Actions (login required)

View Item