Text mining library for Orange data mining suite

David Novak (2016) Text mining library for Orange data mining suite. MSc thesis.

Preview

Abstract

We have developed a text mining system that can be used as an add-on for Orange, a data mining platform. Orange envelops a set of supervised and unsupervised machine learning methods that benefit a typical text mining platform and therefore offers an excellent foundation for development. We have studied the field of text mining and reviewed several open-source toolkits to define its base components. We have included widgets that enable retrieval of data from remote repositories, such as PubMed and New York Times. The pre-processing was designed to include transformation of documents to vectors, stop word removal, lemmatization and stemming. The results can be visualized via widgets such as the word cloud. Our goal was to develop widgets that can be easily incorporated into the existing Orange workflow, can be upgraded with additional widgets, and perform well in a visual programming environment.

Item Type:

Thesis (MSc thesis)

Keywords:

text mining, data pre-processing, visualization, visual programming

Number of Pages:

Language of Content:

Slovenian

Mentor / Comentors:

Name and Surname	ID	Function
prof. dr. Blaž Zupan	106	Mentor

Link to COBISS:

http://www.cobiss.si/scripts/cobiss?command=search&base=51012&select=(ID=1537041091)

Institution:

University of Ljubljana

Department:

Faculty of Computer and Information Science

Item ID:

3374

Date Deposited:

23 Jun 2016 15:16

Last Modified:

28 Jul 2016 08:25

URI:

http://eprints.fri.uni-lj.si/id/eprint/3374

Actions (login required)

View Item