Miloš Pisarević (2016) Comparing lossless data compression algorithms using the ALGator system. EngD thesis.
Abstract
The purpose of the thesis is to present the fundamental lossless data compression algorithms, test and compare them. The theoretical part outlines some of the general characteristics of data coding and explores in more detail the differences between different types of codes, particularly VLC codes. Moreover, it touches on the information theory and explains why it plays an important role in coding. It also thoroughly explains actual algorithms, starting with statistical algorithms to describe the functioning of Shannon-Fano coding, Huffman coding and arithmetic coding. Next are algorithms that achieve compression using a dictionary, i.e. algorithms LZ77 and LZ78 and their respective variants LZSS and LZW. The last algorithm presented is the Burrows-Wheeler algorithm that involves the Burrows-Wheeler transformation and move-to-front transformation with entropy coding. As for the practical part, it deals with the implementation and testing of algorithms for Huffman coding, arithmetic coding, LZSS, LZW and Bzip2. The algorithms are implemented with open source implementations in the Java programming language. The ALGator system is used for testing them. The tests are performed on several widely known test collections that were developed for testing lossless compression. Those collections are Calgary, Canterbury, Silesia and Maximum. The comparison of test results determines the effectiveness of the implemented algorithms.
Actions (login required)