Order-Preserving Multiple Pattern Matching in Parallel
Abstract
:1. Introduction
2. Related Works
3. Parallel Algorithms for the OPMPM Problem
3.1. Parallel OPMPM Algorithm Using the Aho–Corasick Automaton
Algorithm 1 Preprocessing phase (parallel calculation of prefix tables for a pattern set). |
Input: A set of strings Output: A set of prefix tables 1 parallel for to k do 2 parallel for to do 3 4 for to do 5 if then 6 |
Algorithm 2 Search phase. |
Input: Aho–Corasick automaton, string T Output: Positions i of substrings of T which is order-isomorphic to 1 , OST , int r 2 parallel for to b do 3 for to do 4 if then 5 break 6 .insert 7 .rank 8 while 9 .delete 10 11 .rank 12 13 if and then 14 print |
3.2. Parallel OPMPM Algorithm Using the Fingerprint Table
Algorithm 3 Parallel calculation of the fingerprint table. |
Input: A set of strings , int m, int q Output: 1 parallel for to k do 2 parallel for to q do 3 4 5 for to do 6 if then 7 8 atomicAdd |
Algorithm 4 The search phase of the algorithm using a fingerprint table. |
Input: A string , int m Output: Positions i of substrings of T which is order-isomorphic to 1 parallel for to do 2 for to k do 3 if then 4 if then 5 print |
4. Experimental Results
- Randomly generated strings: Texts and patterns consisting of were randomly generated. Text T was generated by increasing length n from 10,000 by 10,000 to 100,000. A set of patterns P was generated by increasing the number of patterns k from 100 by 100 to 1000 and increasing the pattern length m from 5 by 1 to 15.
- Dow Jones Index: Texts and patterns were extracted at random for each experiment from the daily closing price of the Dow Jones Industrial Average between 2 May 1885 and 12 April 2019 [14]. The text length n was increased from 1000 by 1000 to 10,000. The number of patterns k and their length m were set to the same values used for the randomly generated strings.
- Electrocardiogram data: The electrocardiogram (ECG) data used in the experiment were obtained from the MIT-BIH ECG biosignal database provided by Physionet [15]. An electrocardiogram records the electrical impulses from the heart. Texts were randomly extracted from the total records, while patterns were extracted evenly from abnormal symptom data and normal data. It should be noted that the texts and patterns were extracted from electrocardiogram data of different individuals. The text length n and the number of patterns k were set equal to those of the randomly generated strings, and the pattern length m increased from 10 by 1 to 15 during the generation process.
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Kim, J.; Eades, P.; Fleischer, R.; Hong, S.H.; Iliopoulos, C.S.; Park, K.; Puglisi, S.J.; Tokuyama, T. Order-preserving matching. Theor. Comput. Sci. 2014, 525, 68–79. [Google Scholar] [CrossRef]
- Kim, J.; Amir, A.; Na, J.C.; Park, K.; Sim, J.S. On representations of ternary order relations in numeric strings. Math. Comput. Sci. 2017, 11, 127–136. [Google Scholar] [CrossRef]
- Cho, S.; Na, J.C.; Park, K.; Sim, J.S. A fast algorithm for order-preserving pattern matching. Inf. Process. Lett. 2015, 115, 397–402. [Google Scholar] [CrossRef]
- Kim, Y.; Kim, Y.; Sim, J.S. An improved order-preserving pattern matching algorithm using fingerprints. Mathematics 2022, 10, 1954. [Google Scholar] [CrossRef]
- Na, J.C.; Lee, I. A simple heuristic for order-preserving matching. IEICE Trans. Inf. Syst. 2019, 102, 502–504. [Google Scholar] [CrossRef]
- Aho, A.V.; Corasick, M.J. Efficient string matching: An aid to bibliographic search. Commun. ACM 1975, 18, 333–340. [Google Scholar] [CrossRef]
- Han, M.; Kang, M.; Cho, S.; Gu, G.; Sim, J.S.; Park, K. Fast multiple order-preserving matching algorithms. In Proceedings of the International Workshop on Combinatorial Algorithms, Verona, Italy, 5–7 October 2015; pp. 248–259. [Google Scholar]
- Park, J.; Kim, Y.; Sim, J.S. A space-efficient hashing-based algorithm for order-preserving multiple pattern matching problem. KIISE Trans. Comput. Pract. 2018, 24, 399–404. [Google Scholar] [CrossRef]
- Park, K.B.; Kim, Y.; Sim, J.S. Parallel implementation of the order-preserving multiple pattern matching algorithm using the Karp-Rabin algorithm. J. KIISE 2021, 48, 249–256. [Google Scholar] [CrossRef]
- Kubica, M.; Kulczyński, T.; Radoszewski, J.; Rytter, W.; Waleń, T. A linear time algorithm for consecutive permutation pattern matching. Inf. Process. Lett. 2013, 113, 430–433. [Google Scholar] [CrossRef]
- Knuth, D. The Art of Computer Programming, Seminumerical Algorithms; Addison-Wesley: Boston, MA, USA, 1997; Volume 2. [Google Scholar]
- Mareš, M.; Straka, M. Linear-time ranking of permutations. In Proceedings of the Algorithms–ESA 2007: 15th Annual European Symposium, Eilat, Israel, 8–10 October 2007; pp. 187–193. [Google Scholar]
- Chhabra, T.; Tarhio, J. A filtration method for order-preserving matching. Inf. Process. Lett. 2016, 116, 71–74. [Google Scholar] [CrossRef]
- Williamson, S. Daily Closing Values of the DJA in the United States, 1885 to Present, Measuring Worth. Available online: https://www.measuringworth.com/datasets/DJA/index.php (accessed on 17 December 2021).
- Goldberger, A.L.; Amaral, L.A.N.; Glass, L.; Hausdorff, J.M.; Ivanov, P.C.; Mark, R.G.; Mietus, J.E.; Moody, G.B.; Peng, C.K.; Stanley, H.E. Physiobank, physiotoolkit, and physionet: Components of a new research resource for complex physiologic signals. Circulation 2000, 101, e215–e220. Available online: http://circ.ahajournals.org/content/101/23/e215 (accessed on 20 September 2022). [CrossRef] [PubMed]
- Park, S.; Kim, Y.; Sim, J.S. Comparison of order-isomophism verification times of two strings according to their representations. In Proceedings of the Korean Institute of Next Generation Computing Spring Conference, Gwangju, Republic of Korea, 1–13 May 2021; Korean Institute of Next Generation Computing: Seoul, Republic of Korea, 2021; pp. 350–353. [Google Scholar]
x | 23 | 29 | 20 | 57 | 59 |
1 | 2 | 1 | 4 | 5 | |
3 | 1 | 2 | 4 | 5 |
(a) Comparison of execution times varying m when 100,000, | |||
Execution times of the algorithms (unit: ms) | |||
m | |||
6 | 41.31 | 21.58 | 22.62 |
9 | 44.83 | 22.64 | 21.98 |
12 | 49.07 | 22.18 | 22.51 |
15 | 52.49 | 22.16 | 22.52 |
(b) Comparison of execution times varying k when 100,000, | |||
Execution times of the algorithms (unit: ms) | |||
k | |||
100 | 19.35 | 2.22 | 2.24 |
500 | 31.07 | 11.25 | 11.47 |
1000 | 44.83 | 22.64 | 21.98 |
(a) Comparison of execution times varying m when 10,000, | |||
Execution times of the algorithms (unit: ms) | |||
m | |||
6 | 9.58 | 2.29 | 2.55 |
9 | 12.84 | 2.29 | 2.25 |
12 | 16.35 | 2.22 | 2.25 |
15 | 20.33 | 2.34 | 2.45 |
(b) Comparison of execution times varying k when 10,000, | |||
Execution times of the algorithms (unit: ms) | |||
k | |||
100 | 5.03 | 0.49 | 0.47 |
500 | 8.52 | 1.35 | 1.38 |
1000 | 12.84 | 2.29 | 2.25 |
(a) Comparison of execution times varying m when 100,000, | |||
Execution times of the algorithms (unit: ms) | |||
m | |||
10 | 41.18 | 19.32 | 19.08 |
12 | 42.6 | 18.68 | 19.03 |
15 | 44.83 | 18.82 | 18.31 |
(b) Comparison of execution times varying k when 100,000, | |||
Execution times of the algorithms (unit: ms) | |||
k | |||
100 | 18.57 | 2.32 | 2.29 |
500 | 28.23 | 9.89 | 9.46 |
1000 | 41.18 | 19.32 | 19.08 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Park, S.; Park, J.; Kim, Y.; Sim, J.S. Order-Preserving Multiple Pattern Matching in Parallel. Appl. Sci. 2023, 13, 5142. https://doi.org/10.3390/app13085142
Park S, Park J, Kim Y, Sim JS. Order-Preserving Multiple Pattern Matching in Parallel. Applied Sciences. 2023; 13(8):5142. https://doi.org/10.3390/app13085142
Chicago/Turabian StylePark, Somin, Jinhyeok Park, Youngho Kim, and Jeong Seop Sim. 2023. "Order-Preserving Multiple Pattern Matching in Parallel" Applied Sciences 13, no. 8: 5142. https://doi.org/10.3390/app13085142
APA StylePark, S., Park, J., Kim, Y., & Sim, J. S. (2023). Order-Preserving Multiple Pattern Matching in Parallel. Applied Sciences, 13(8), 5142. https://doi.org/10.3390/app13085142