ePrints.FRI - University of Ljubljana, Faculty of Computer and Information Science

Prediction pf popularity of news in WEB magazines using support vector machines

Dušan Šmitran (2010) Prediction pf popularity of news in WEB magazines using support vector machines. EngD thesis.

[img] PDF
Download (1915Kb)


    Machine learning methods are successfully used in text classification. The usage of support vector machines, has experienced a boom in the recent years on classifying text. Support vector machine proved its success with comprehensive performance on problems that do not have explicitly defined attributes. Its success is attributed mainly due to the usage of string kernel, which maps examples into a higher dimensional space. SVM is calling the string kernel to get the information on how much 2 examples are similar. Our goal is to use support vector machine to predict the most read news of tomorrow. We develop the idea of using string kernels for our particular problem and compare kernels operating on different levels. One operating on word level and one on character level. A database was build up, containing 2500 news to test our classification models and string kernels. We searched for the optimal SVM kernel parameters and compared them with a technique called learning curve. A real world environment was build up, simulating how good a model can predict which of today’s news, will become highly readable in the future.

    Item Type: Thesis (EngD thesis)
    Keywords: text mining, machine learning, news ranking, Support vector machine
    Number of Pages: 40
    Language of Content: Slovenian
    Mentor / Comentors:
    Name and SurnameIDFunction
    prof. dr. Blaž Zupan106Mentor
    Link to COBISS: http://www.cobiss.si/scripts/cobiss?command=search&base=50070&select=(ID=7698260)
    Institution: University of Ljubljana
    Department: Faculty of Computer and Information Science
    Item ID: 1052
    Date Deposited: 30 Mar 2010 08:33
    Last Modified: 13 Aug 2011 00:36
    URI: http://eprints.fri.uni-lj.si/id/eprint/1052

    Actions (login required)

    View Item