ePrints.FRI - University of Ljubljana, Faculty of Computer and Information Science

Text mining library for Orange data mining suite

David Novak (2016) Text mining library for Orange data mining suite. MSc thesis.

Download (4Mb)


    We have developed a text mining system that can be used as an add-on for Orange, a data mining platform. Orange envelops a set of supervised and unsupervised machine learning methods that benefit a typical text mining platform and therefore offers an excellent foundation for development. We have studied the field of text mining and reviewed several open-source toolkits to define its base components. We have included widgets that enable retrieval of data from remote repositories, such as PubMed and New York Times. The pre-processing was designed to include transformation of documents to vectors, stop word removal, lemmatization and stemming. The results can be visualized via widgets such as the word cloud. Our goal was to develop widgets that can be easily incorporated into the existing Orange workflow, can be upgraded with additional widgets, and perform well in a visual programming environment.

    Item Type: Thesis (MSc thesis)
    Keywords: text mining, data pre-processing, visualization, visual programming
    Number of Pages: 87
    Language of Content: Slovenian
    Mentor / Comentors:
    Name and SurnameIDFunction
    prof. dr. Blaž Zupan106Mentor
    Link to COBISS: http://www.cobiss.si/scripts/cobiss?command=search&base=51012&select=(ID=1537041091)
    Institution: University of Ljubljana
    Department: Faculty of Computer and Information Science
    Item ID: 3374
    Date Deposited: 23 Jun 2016 15:16
    Last Modified: 28 Jul 2016 08:25
    URI: http://eprints.fri.uni-lj.si/id/eprint/3374

    Actions (login required)

    View Item