Dušan Šmitran (2010) Prediction pf popularity of news in WEB magazines using support vector machines. EngD thesis.
Machine learning methods are successfully used in text classification. The usage of support vector machines, has experienced a boom in the recent years on classifying text. Support vector machine proved its success with comprehensive performance on problems that do not have explicitly defined attributes. Its success is attributed mainly due to the usage of string kernel, which maps examples into a higher dimensional space. SVM is calling the string kernel to get the information on how much 2 examples are similar. Our goal is to use support vector machine to predict the most read news of tomorrow. We develop the idea of using string kernels for our particular problem and compare kernels operating on different levels. One operating on word level and one on character level. A database was build up, containing 2500 news to test our classification models and string kernels. We searched for the optimal SVM kernel parameters and compared them with a technique called learning curve. A real world environment was build up, simulating how good a model can predict which of today’s news, will become highly readable in the future.
Actions (login required)