ePrints.FRI - University of Ljubljana, Faculty of Computer and Information Science

Processing large amounts of data in the cloud

Sandi Holub (2014) Processing large amounts of data in the cloud. EngD thesis.

Download (2354Kb)


    Big Data is gaining recognition in the world of information technology. These are tools that allow saving and retrieval of a large amount of data. Hadoop is an open source project of the company Apache, which combines the tools for storage, processing and retrieval of structured and or unstructured data. Data need to be managed with the help of appropriate infrastructure, which in most cases are clusters of computers. We can help ourselves using a cloud, if we do not wish to have the infrastructure nearby. YARN, MapReduce, Pig and Hadoop Distributed File System (HDFS) are the basic components of the Hadoop project and contribute to simple implementation of the first version of software. The reader can use this diploma thesis as help in setting up the basic Hadoop cluster and develop a Java application or Pig script. The time and price comparison of running in the cloud or local cluster can also help as decision-making process when buying infrastructure.

    Item Type: Thesis (EngD thesis)
    Keywords: Hadoop, MapReduce, large amount of data, cluster, Pig, cloud
    Number of Pages: 82
    Language of Content: Slovenian
    Mentor / Comentors:
    Name and SurnameIDFunction
    viš. pred. dr. Aljaž Zrnec291Mentor
    Link to COBISS: http://www.cobiss.si/scripts/cobiss?command=search&base=50070&select=(ID=00010718292)
    Institution: University of Ljubljana
    Department: Faculty of Computer and Information Science
    Item ID: 2618
    Date Deposited: 14 Jul 2014 15:47
    Last Modified: 20 Aug 2014 14:02
    URI: http://eprints.fri.uni-lj.si/id/eprint/2618

    Actions (login required)

    View Item