Tomaž Hočevar (2012) Parallel string matching algorithms. EngD thesis.
Abstract
This thesis presents different string searching algorithms. The string searching or string matching problem is one of the most basic problems on strings. It resurfaced with the development of bioinformatics and the need for DNA sequence analysis. It also presents a foundation for solving other, more complex string problems. First, we describe classical algoritms with linear time complexity such as Knuth-Morris-Pratt and Rabin-Karp. Then we turned our attention to the possibilites of parallelization. We present a simple parallelization scheme which finds the pattern in a string of size N in O(sqrt(N)) time on O(sqrt(N)) processors and a more complex Vishkin algoritm which solves it in O(log(N)) time. The algorithms were compared on real as well as on degenerate test cases. They were implemented in C++ and with the use of OpenMP library for parallelization. We determined that the basic algorithms were sufficient for most practical purposes. The degenerate cases can also be solved efficiently with sequential linear time algorithms. However, the experiments on parallelization showed that multicore computers these days differ too much from computation models to reach the expected theoretic speedup.
Actions (login required)