TY - GEN
T1 - High performance parallelization of Boyer-Moore algorithm on many-core accelerators
AU - Jeong, Yosang
AU - Lee, Myungho
AU - Nam, Dukyun
AU - Kim, Jik Soo
AU - Hwang, Soonwook
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2015/1/26
Y1 - 2015/1/26
N2 - Boyer-Moore (BM) algorithm is a single pattern string matching algorithm. It is considered as the most efficient string matching algorithm and used in many applications. The algorithm first calculates two string shift rules based on the given pattern string in the preprocessing phase. These rules help skip parts of the target input string where there is no match to be found. Using the two shift rules, pattern matching operations are performed against the target input sting in the second phase. The second phase is a time consuming process and needs to be parallelized to achieve the high performance string matching. In this paper, we parallelize the BM algorithm on the latest many core accelerators such as the Intel Xeon Phi and the Nvidia Tesla K20 GPU, along with the general-purpose multi-core processors. We partition the target input data amongst multiple threads for parallel execution. Data lying on the threads' boundaries need to be copied redundantly so that the pattern string lying on the boundary can be found. As the target length increases, the algorithm incurs increased matching operations. Also, as the pattern length increases, the number of possible matches decreases. This can potentially lead to the unbalanced workload distribution among threads. Furthermore, the redundant data copy significantly overloads the on-chip shared memories of the GPU for a large number of threads. We use the dynamic scheduling and the multithreading techniques to solve the load balancing problem. We also use the algorithmic cascading technique to reduce the burden on the shared memories of the GPU. Our parallel implementation leads to ∼ 17-times speedup on the Xeon Phi and ∼ 45-times speedup on the Nvidia Tesla K20GPU compared with a serial implementation on the host Intel Xeon processor.
AB - Boyer-Moore (BM) algorithm is a single pattern string matching algorithm. It is considered as the most efficient string matching algorithm and used in many applications. The algorithm first calculates two string shift rules based on the given pattern string in the preprocessing phase. These rules help skip parts of the target input string where there is no match to be found. Using the two shift rules, pattern matching operations are performed against the target input sting in the second phase. The second phase is a time consuming process and needs to be parallelized to achieve the high performance string matching. In this paper, we parallelize the BM algorithm on the latest many core accelerators such as the Intel Xeon Phi and the Nvidia Tesla K20 GPU, along with the general-purpose multi-core processors. We partition the target input data amongst multiple threads for parallel execution. Data lying on the threads' boundaries need to be copied redundantly so that the pattern string lying on the boundary can be found. As the target length increases, the algorithm incurs increased matching operations. Also, as the pattern length increases, the number of possible matches decreases. This can potentially lead to the unbalanced workload distribution among threads. Furthermore, the redundant data copy significantly overloads the on-chip shared memories of the GPU for a large number of threads. We use the dynamic scheduling and the multithreading techniques to solve the load balancing problem. We also use the algorithmic cascading technique to reduce the burden on the shared memories of the GPU. Our parallel implementation leads to ∼ 17-times speedup on the Xeon Phi and ∼ 45-times speedup on the Nvidia Tesla K20GPU compared with a serial implementation on the host Intel Xeon processor.
KW - Algorithmic cascading
KW - Boyer-Moore algorithm
KW - dynamic scheduling
KW - many-core accelerator
KW - multithreading
KW - parallelization
UR - http://www.scopus.com/inward/record.url?scp=84924007670&partnerID=8YFLogxK
U2 - 10.1109/ICCAC.2014.20
DO - 10.1109/ICCAC.2014.20
M3 - Conference contribution
AN - SCOPUS:84924007670
T3 - Proceedings - 2014 International Conference on Cloud and Autonomic Computing, ICCAC 2014
SP - 265
EP - 272
BT - Proceedings - 2014 International Conference on Cloud and Autonomic Computing, ICCAC 2014
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2014 International Conference on Cloud and Autonomic Computing, ICCAC 2014
Y2 - 8 September 2014 through 12 September 2014
ER -