ePrints.FRI - University of Ljubljana, Faculty of Computer and Information Science

Machine learning algorithms in distributed environment with MapReduce paradigm

Roman Orač (2014) Machine learning algorithms in distributed environment with MapReduce paradigm. MSc thesis.

Download (1945Kb)


    Implementation of machine learning algorithms in a distributed environment ensures us multiple advantages, like processing of large datasets and linear speedup with additional processing units. We describe the MapReduce paradigm, which enables distributed computing, and the Disco framework, which implements it. We present the summation form, which is a condition for efficient implementation of algorithms with the MapReduce paradigm, and describe the implementations of the selected algorithms. We propose novel distributed random forest algorithms that build models on subsets of the dataset. We compare time and accuracy of the algorithms with the well recognized data analytics tools. We end our master thesis by describing the integration of the implemented algorithms into the ClowdFlows platform, which is a web platform for construction, execution and sharing of interactive workflows for data mining. With this integration, we enabled processing of big batch data with visual programming.

    Item Type: Thesis (MSc thesis)
    Keywords: MapReduce, distributed computing, Disco, machine learning, summation form, DiscoMLL, distributed random forest, ClowdFlows.
    Number of Pages: 123
    Language of Content: Slovenian
    Mentor / Comentors:
    Name and SurnameIDFunction
    izr. prof. dr. Marko Robnik Šikonja276Mentor
    prof. dr. Nada LavračComentor
    Link to COBISS: http://www.cobiss.si/scripts/cobiss?command=search&base=51012&select=(ID=1536017347)
    Institution: University of Ljubljana
    Department: Faculty of Computer and Information Science
    Item ID: 2829
    Date Deposited: 15 Oct 2014 19:56
    Last Modified: 06 Nov 2014 11:19
    URI: http://eprints.fri.uni-lj.si/id/eprint/2829

    Actions (login required)

    View Item