Prediction pf popularity of news in WEB magazines using support vector machines

Dušan Šmitran (2010) Prediction pf popularity of news in WEB magazines using support vector machines. EngD thesis.

PDF
Download (1915Kb)

Abstract

Machine learning methods are successfully used in text classification. The usage of support vector machines, has experienced a boom in the recent years on classifying text. Support vector machine proved its success with comprehensive performance on problems that do not have explicitly defined attributes. Its success is attributed mainly due to the usage of string kernel, which maps examples into a higher dimensional space. SVM is calling the string kernel to get the information on how much 2 examples are similar. Our goal is to use support vector machine to predict the most read news of tomorrow. We develop the idea of using string kernels for our particular problem and compare kernels operating on different levels. One operating on word level and one on character level. A database was build up, containing 2500 news to test our classification models and string kernels. We searched for the optimal SVM kernel parameters and compared them with a technique called learning curve. A real world environment was build up, simulating how good a model can predict which of today’s news, will become highly readable in the future.

Item Type:

Thesis (EngD thesis)

Keywords:

text mining, machine learning, news ranking, Support vector machine

Number of Pages:

Language of Content:

Slovenian

Mentor / Comentors:

Name and Surname	ID	Function
prof. dr. Blaž Zupan	106	Mentor

Link to COBISS:

http://www.cobiss.si/scripts/cobiss?command=search&base=50070&select=(ID=7698260)

Institution:

University of Ljubljana

Department:

Faculty of Computer and Information Science

Item ID:

1052

Date Deposited:

30 Mar 2010 08:33

Last Modified:

13 Aug 2011 00:36

URI:

http://eprints.fri.uni-lj.si/id/eprint/1052

Actions (login required)

View Item