Processing large amounts of data in the cloud

Sandi Holub (2014) Processing large amounts of data in the cloud. EngD thesis.

Preview

Abstract

Big Data is gaining recognition in the world of information technology. These are tools that allow saving and retrieval of a large amount of data. Hadoop is an open source project of the company Apache, which combines the tools for storage, processing and retrieval of structured and or unstructured data. Data need to be managed with the help of appropriate infrastructure, which in most cases are clusters of computers. We can help ourselves using a cloud, if we do not wish to have the infrastructure nearby. YARN, MapReduce, Pig and Hadoop Distributed File System (HDFS) are the basic components of the Hadoop project and contribute to simple implementation of the first version of software. The reader can use this diploma thesis as help in setting up the basic Hadoop cluster and develop a Java application or Pig script. The time and price comparison of running in the cloud or local cluster can also help as decision-making process when buying infrastructure.

Item Type:

Thesis (EngD thesis)

Keywords:

Hadoop, MapReduce, large amount of data, cluster, Pig, cloud

Number of Pages:

Language of Content:

Slovenian

Mentor / Comentors:

Name and Surname	ID	Function
viš. pred. dr. Aljaž Zrnec	291	Mentor

Link to COBISS:

http://www.cobiss.si/scripts/cobiss?command=search&base=50070&select=(ID=00010718292)

Institution:

University of Ljubljana

Department:

Faculty of Computer and Information Science

Item ID:

2618

Date Deposited:

14 Jul 2014 15:47

Last Modified:

20 Aug 2014 14:02

URI:

http://eprints.fri.uni-lj.si/id/eprint/2618

Actions (login required)

View Item