Learning from textual data streams for detecting email spam

Jernej Porenta (2016) Learning from textual data streams for detecting email spam. MSc thesis.

Preview

Abstract

This master thesis introduces a method for the detecting email spam through the translation problem in incremental learning of the time series. Common spam detection systems mainly use methods of supervised learning (naive Bayesian classifier, decision trees), while in the master’s thesis presents the classification by using the methods of data stream mining. For learning sets, we also choose the attributes that do not contain personal data and which are not required to obtain the consent of the sender or the recipient (attributes consist the envelope part of e-mail). With the help of algorithms for learning from data streams (VFDT, cVFDT) we used the electronic sequence of messages as text data stream. The results were compared with the traditional spam detection methods and they show that traditional spam detection methods have higher accuracy compared to algorithms for learning from data stream and therefore are not suitable for detecting email spam.

Item Type:

Thesis (MSc thesis)

Keywords:

email, machine learning, stream mining

Number of Pages:

Language of Content:

Slovenian

Mentor / Comentors:

Name and Surname	ID	Function
izr. prof. dr. Zoran Bosnić	3826	Mentor
doc. dr. Mojca Ciglarič	256	Comentor

Link to COBISS:

http://www.cobiss.si/scripts/cobiss?command=search&base=51012&select=(ID=1537120195)

Institution:

University of Ljubljana

Department:

Faculty of Computer and Information Science

Item ID:

3535

Date Deposited:

08 Sep 2016 16:22

Last Modified:

20 Sep 2016 09:52

URI:

http://eprints.fri.uni-lj.si/id/eprint/3535

Actions (login required)

View Item