An HBase-Based Optimization Model for Distributed Medical Data Storage and Retrieval
Abstract
:1. Introduction
- (1)
- A data temperature recognition method S-TCR and a data management algorithm SL-TCR were proposed to manage medical data dynamically;
- (2)
- An optimized secondary indexing strategy was proposed to improve the speed of medical data diversity queries;
- (3)
- The feasibility and efficiency of the proposed model were verified by experiments on real medical data sets.
2. Background Knowledge
2.1. HBase Database
2.2. Hot and Cold Data Management Algorithms
2.2.1. LRU
2.2.2. LFU
2.2.3. Size
2.2.4. TCR
2.3. Secondary Index
2.4. Summary
3. Materials and Methods
3.1. Overview
3.2. Temperature Marking Module
- (1)
- The temperature change in S-TCR with time was simulated only considering the access time. Data-1 was the data that had never been accessed. Data-2 refers to the data that wre frequently accessed in the first 100 s and never accessed in the last 200 s. Data-3 refers to the data that were never accessed in the first 200 s and frequently accessed in the following 100 s. Data-4 is the data that were never accessed in the first 280 s and frequently accessed in the second 20 s. The temperature changes of four kinds of data over time are shown in Figure 7.
- (2)
- The temperature change in S-TCR with time was simulated only considering the access frequency. Data-1, Data-2, and Data-3 are the data that were never accessed in the first 280 s but were frequently accessed in the later 20 s. In addition, Data-1 was the control group; Data-2 was accessed twice as often as Data-1; Data-3 was accessed three times as often as Data-1. The temperature changes of three kinds of data over time are shown in Figure 8.
- (3)
- Temperature changes of S-TCR over time were simulated only considering data Size. Data-1, Data-2, and Data-3 are the data that were never accessed in the first 280 s but were frequently accessed in the later 20 s. In addition, Data-1 was the control group; The Size of data 2 was 100 times that of Data-1. Data-2 was 1000 times larger than Data-1. The temperature changes of three kinds of data over time are shown in Figure 9.
3.3. Dynamic Management Module of Data
Algorithm 1. The process of algorithm SL-TCR |
Input: key, value, T, t, size, threshold1, threshold2 Output 1. data = new Node(key, value, T, t, size) 2. if WarmArea.size() < threshold1 then 3. WarmArea.put(data); 4. end 5. else if WarmArea.size() ≥ threshold1 then 6. update_Temperature(); 7. HotCold ← sort(stcr); 8. if ltcr.size < threshold2 then 9. WarmArea.remove(Hot); 10. HotArea.put(Hot); 11. end 12. else 13. for i ← 0 to k do 14. Cold ← WarmArea.remove(); 15. end 16. HotArea.remove(node.T < Cold.T); 17. end 18. WarmArea.put(data); 19. end |
3.4. Index Management Module
4. Experiments and Results
4.1. Performance of Dynamic Management Model
4.2. Performance of Secondary Index
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Ch, R.; Srivastava, G.; Nagasree, Y.L.V.; Ponugumati, A.; Ramachandran, S. Robust Cyber-Physical System Enabled Smart Healthcare Unit Using Blockchain Technology. Electronics 2022, 11, 3070. [Google Scholar] [CrossRef]
- Hamid, S.; Bawany, N.Z.; Sodhro, A.H.; Lakhan, A.; Ahmed, S. A Systematic Review and IoMT Based Big Data Framework for COVID-19 Prevention and Detection. Electronics 2022, 11, 2777. [Google Scholar] [CrossRef]
- Zeng, N.; Zhang, G.Q.; Li, X.; Cui, L. Evaluation of relational and NoSQL approaches for patient cohort identification from heterogeneous data sources. In Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine, BIBM, Kansas City, MO, USA, 13–16 November 2017. [Google Scholar]
- Chui, K.T.; Alhalabi, W.; Pang, S.S.H.; Pablos, P.O.D.; Liu, R.W.; Zhao, M. Disease Diagnosis in Smart Healthcare: Innovation, Technologies and Applications. Sustainability 2017, 9, 2309. [Google Scholar] [CrossRef] [Green Version]
- Nasajpour, M.; Pouriyeh, S.; Parizi, R.M.; Dorodchi, M.; Valero, M.; Arabnia, H.R. Internet of Things for Current COVID-19 and Future Pandemics: An Exploratory Study. J. Health Inform. Res. 2020, 4, 325–364. [Google Scholar] [CrossRef]
- Tsai, C.P.; Chang, C.W.; Hsiao, H.C.; Shen, H. The Time Machine in Columnar NoSQL Databases: The Case of Apache HBase. Future Internet 2022, 14, 92. [Google Scholar] [CrossRef]
- Ahmad, G.; Mariam, M.; Mohamad, J.; Yliès, F. User-based Load Balancer in HBase. In Proceedings of the 7th International Conference on Cloud Computing and Services Science, CLOSER, Porto, Portugal, 24–26 April 2017. [Google Scholar]
- Wang, S. Research on Key Technologies of HBase Database Evaluation. Master Thesis, Harbin Institute of Technology, Harbin, China, 2015. [Google Scholar]
- Uzunidis, D.; Karkazis, P.; Roussou, C.; Patrikakis, C.; Leligou, H.C. Intelligent Performance Prediction: The Use Case of a Hadoop Cluster. Electronics 2021, 10, 2690. [Google Scholar] [CrossRef]
- Kavitha, C.; Srividhya, S.R.; Lai, W.-C.; Mani, V. IMapC: Inner MAPping Combiner to Enhance the Performance of MapReduce in Hadoop. Electronics 2022, 11, 1599. [Google Scholar] [CrossRef]
- Zhu, Y. Research on Hot Spot Load Balancing in Distributed Database System. Master’s Thesis, Huazhong University of Science and Technology, Wuhan, China, 2015. [Google Scholar]
- Yi, C. Analysis and Optimization of Hybrid Storage Cold and Hot Data Based on Machine Learning. Master’s Thesis, Huazhong University of Science and Technology, Wuhan, China, 2020. [Google Scholar]
- Kunhui, L.; Kun, G.; Hong, G. Financial Big Data Hot and Cold Separation Scheme Based on HBase and Redis. In Proceedings of the IEEE International Conference on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking, ISPA/BDCloud/SocialCom/SustainCom, Xiamen, China, 16–18 December 2019. [Google Scholar]
- Hsieh, J.W.; Kuo, T.W.; Chang, L.P. Efficient identification of hot data for flash memory storage systems. ACM Trans. Storage 2006, 2, 22–40. [Google Scholar] [CrossRef]
- Qader, M.A.; Cheng, S.; Hristidis, V. A Comparative Study of Secondary Indexing Techniques in LSM-based NoSQL Databases. In Proceedings of the International Conference on Management of Data, SIGMOD, Houston, TX, USA, 10–15 June 2018. [Google Scholar]
- Cao, C.; Wang, W.; Ying, Z. Embedding Index Maintenance in Store Routines to Accelerate Secondary Index Building in HBase. In Proceedings of the 11th IEEE International Conference on Cloud Computing, CLOUD, San Francisco, CA, USA, 2–7 July 2018. [Google Scholar]
- Ye, F.; Zhu, S.; Lou, Y. Research on Index Mechanism of HBase Based on Coprocessor for Sensor Data. In Proceedings of the 43rd Annual Computer Software and Applications Conference, COMPSAC, Milwaukee, WI, USA, 15–19 July 2019. [Google Scholar]
- Cui, C.; Zheng, L.; Han, F. Design of secondary indexes in HBase based on memory. J. Comput. Appl. 2018, 38, 1584–1590. [Google Scholar]
- Shen, B.; Liao, Y.C.; Liu, D. A Method of HBase Multi-Conditional Query for Ubiquitous Sensing Applications. Sensors 2018, 18, 3064. [Google Scholar] [CrossRef] [Green Version]
- Ali, M.; Mohajeri, J.; Sadeghi, M.R. Attribute-Based Fine-Grained Access Control for Outscored Private Set Intersection Computation. Inf. Sci. 2020, 536, 222–243. [Google Scholar] [CrossRef]
- Zhang, H.; Rong-Li, G.A.I. Distributed HBase Cluster Storage Engine and Database Performance Optimization. In Proceedings of the 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application, HPCC/DSS/SmartCity/DependSys, Haikou, China, 20–22 December 2021. [Google Scholar]
- Hassan, M.U.; Yaqoob, I.; Zulfiqar, S.; Hameed, I.A. A Comprehensive Study of HBase Storage Architecture—A Systematic Literature Review. Symmetry 2021, 13, 109. [Google Scholar] [CrossRef]
- Liu, S.; Guo, Z.; Chen, L. The Read Amplification Analysis of NoSQL Database on Top of OSDs: A Case Study of HBase. In Proceedings of the 4th International Conference on Big Data Computing and Communications, BigCom, Chicago, IL, USA, 7–9 August 2018. [Google Scholar]
- Wen, S. Efficient DNA Sequences Storage Scheme based on HBase. In Proceedings of the 2nd International Conference on Mechanical, Electronic, Control and Automation Engineering, MECAE, Qingdao, China, 30–31 March 2018. [Google Scholar]
- Haifa, A.; Chase, Q.W. On Performance Modeling and Prediction for Spark-HBase Applications in Big Data Systems. In Proceedings of the IEEE International Conference on Communications, ICC, Seoul, Republic of Korea, 16–20 May 2022. [Google Scholar]
- Xiong, W.; Szefer, J. Leaking Information Through Cache LRU States. In Proceedings of the High-Performance Computer Architecture, HPCA, San Diego, CA, USA, 22–26 February 2020. [Google Scholar]
- Hasslinger, G.; Ntougias, K.; Hasslinger, F. Comparing Web Cache Implementations for Fast O(1) Updates Based on LRU, LFU and Score Gated Strategies. In Proceedings of the 23rd IEEE International Workshop on Computer Aided Modeling and Design of Communication Links and Networks, CAMAD, Barcelona, Spain, 17–19 September 2018. [Google Scholar]
- Li, P.; Pronovost, C.; Wilson, W. Beating OPT with Statistical Clairvoyance and Variable Size Caching. In Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS, Providence, RI, USA, 13–17 April 2019. [Google Scholar]
- Xie, Y. Research on Cold and Hot Data Identification Mechanism Based on Data Temperature. Master’s Thesis, Zhejiang University, Hangzhou, China, 2019. [Google Scholar]
- Kraska, T.; Beutel, A.; Chi, E.H. The Case for Learned Index Structures. In Proceedings of the International Conference on Management of Data, SIGMOD, Houston, TX, USA, 10–15 June 2018. [Google Scholar]
- He, J.; Yao, S.W.; Cai, L. SLC-index: A scalable skip list-based index for cloud data processing. J. Cent. South Univ. 2018, 25, 2438–2450. [Google Scholar] [CrossRef]
- Niu, S.; Wang JWang, B. Ciphertext Sorting Search Scheme Based on B+ Tree Index Structure on Blockchain. J. Electron. Inf. Technol. 2019, 41, 2409–2415. [Google Scholar]
- Schlosser, R.; Kossmann, J.; Boissier, M. Efficient Scalable Multi-Attribute Index Selection Using Recursive Strategies. In Proceedings of the 35th IEEE International Conference on Data Engineering, ICDE, Macao, China, 8–11 April 2019. [Google Scholar]
- Wu, J.; Lu, W.; Yan, G.; Li, X. HyperTree: High Concurrency B+ tree index accelerator. Comput. Res. Dev. 2022, 11, 1–16. [Google Scholar]
- Hu, Z.; Hu, M. Design and Implementation of T-Hash Tree in Main Memory DataBase. In Proceedings of the 3rd International Conference on Image, Vision and Computing, ICIVC, Chongqing, China, 27–29 July 2018. [Google Scholar]
- Chen, Y.; Li, J.; Li, Y. SBS: Efficient R-tree query algorithm based on Internal Parallelism of Solid State Drive. J. Comput. Res. Dev. 2020, 57, 2404–2418. [Google Scholar]
- Chee-Yong, C.; Ioannidis, Y.E. Bitmap index design and evaluation. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD, Seattle, DC, USA, 2–4 June 1998. [Google Scholar]
- Zou, Z.; Zheng, L.; Xia, D. CSIndex: A Coprocessor-Based Classified Secondary Index Mechanism for Efficient HBase Query. In Proceedings of the IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking, ISPA/BDCloud/SocialCom/SustainCom, Xiamen, China, 16–18 December 2019. [Google Scholar]
- Chen, W. Storage and Retrieval of Medical Image Files Based on Hadoop. Master’s Thesis, Beijing University of Technology, Beijing, China, 2019. [Google Scholar]
- Chen, S.; Zou, Z.; Liu, R.; Tao, T.; Wang, C.; ZHENG, L. Design of HBase classification secondary index based on coprocessor. J. Chongqing Univ. Technol. 2021, 35, 142–151. [Google Scholar]
- Li, F.; Lu, Y.; Yang, Z. SineKV: Decoupled Secondary Indexing for LSM -based Key-Value Stores. In Proceedings of the 40th International Conference on Distributed Computing Systems, ICDCS, Singapore, 29 November 2020. [Google Scholar]
- Chen, H.; Ruan, C.; Li, C. SpanDB: A Fast, Cost-Effective LSM-tree Based KV Store on Hybrid Storage. In Proceedings of the 19th USENIX Conference on File and Storage Technologies, FAST, Santa Clara, CA, USA, 23–25 February 2021. [Google Scholar]
- Wang, H.; Li, Z.; Zhang, X. A Performance Optimization Method for Key -Value Store Based on LSM-tree. In Proceedings of the 3rd International conference on IMAGE, VISION and COMPUTING, ICIVC, Chongqing, China, 27–29 July 2018. [Google Scholar]
- Tian, X.; Wang, C. Index and Matching Method of Linear Algebraic Expressions. Comput. Eng. 2018, 44, 201–207. [Google Scholar]
- Zhou, W.; Lu, J.; Zhou, K.; Wang, S.; Yao, S. Research on double-layer index architecture of cloud data processing based on concurrent skip list. Comput. Res. Dev. 2015, 52, 1531–1545. [Google Scholar]
- Zhang, K.; Zhou, W.; Sun, S. Multiple complementary inverted indexing based on multiple metrics. Multim. Tools Appl. 2019, 78, 7727–7747. [Google Scholar] [CrossRef]
- Singh, M.; Sural, S.; Vaidya, J.; Atluri, V. Managing Attribute-Based Access Control Policies in a Unified Framework using Data Warehousing and In-Memory Database. Comput. Secur. 2019, 86, 183–205. [Google Scholar] [CrossRef]
- Levandoski, J.J.; Larson, P.Å.; Stoica, R. Identifying hot and cold data in main-memory databases. In Proceedings of the 29th IEEE International Conference on Data Engineering, ICDE, Brisbane, QLD, Australia, 8–12 April 2013. [Google Scholar]
- Moghimi, A.; Eisenbarth, T.; Sunar, B. MemJam: A False Dependency Attack Against Constant-Time Crypto Implementations. Int. J. Parallel Program 2019, 47, 538–570. [Google Scholar] [CrossRef] [Green Version]
- Einziger, G.; Friedman, R.; Manes, B. TinyLFU: A Highly Efficient Cache Admission Policy. ACM Trans. Storage 2017, 13, 1–31. [Google Scholar] [CrossRef]
- Waldspurger, C.A.; Saemundson, T.; Ahmad, I. Cache modeling and optimization using miniature simulations. In Proceedings of the USENIX Annual Technical Conference, USENIX ATC, Santa Clara, CA, USA, 12–14 July 2017. [Google Scholar]
- Beiji, Z.; Meng, Z.; Chengzhang, Z.; Ling, X.; Zhi, C. A Learned Prefix Bloom Filter for Spatial Data. In Proceedings of the 33rd International Conference Database and Expert Systems Applications, DEXA, Vienna, Austria, 22–24 August 2022. [Google Scholar]
- Pu, C.; Choo, K. Lightweight Sybil Attack Detection in IoT based on Bloom Filter and Physical Unclonable Function. Comput. Secur. 2022, 113, 102541. [Google Scholar] [CrossRef]
- Pei, S.; Xie, K.; Wang, X. BhBF: A Bloom Filter Using Bh Sequences for Multi-set Membership Query. ACM Trans. Knowl. Discov. Data 2022, 16, 3502735. [Google Scholar] [CrossRef]
- Hua, W.; Gao, Y.; Lv, M. Survey of Bloom filter research. Appl. Comput. 2022, 42, 1729–1747. [Google Scholar]
- Chen, Y.; Xiang, X.; Ling, X. Dynamic Load Balance for Hot-spot and Unbalance Region Problems in HBase. In Proceedings of the IEEE International Conference on Big Data, Atlanta, GA, USA, 10–13 December 2020. [Google Scholar]
- Yang, L.; Chen, J.; Xiang, Y. Performance Optimization Strategy for Distributed Storage of industrial Time series Big Data Based on HBase. Comput. Appl. 2022, 33, 21–27. [Google Scholar]
- Alistair, E.W.J.; David, J.S.; Leo, A.C.; Tom, J.P. The MIMIC Code Repository: Enabling reproducibility in critical care research. J. Am. Med. Inform. Assoc. 2018, 25, 32–39. [Google Scholar]
Number of Queries | HBase | Bloom Filter | Same RegionServer | Ours |
---|---|---|---|---|
200 | 332.78 | 292.988 | 324.5 | 279.1359 |
400 | 685.27 | 573.3687 | 658.03 | 552.9814 |
600 | 1027.73 | 822.6875 | 965.95 | 774.67 |
800 | 1358.72 | 1050.404 | 1279.28 | 985.3437 |
1000 | 1704.06 | 1290.018 | 1583.52 | 1179.21 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhu, C.; Liu, Z.; Zou, B.; Xiao, Y.; Zeng, M.; Wang, H.; Fan, Z. An HBase-Based Optimization Model for Distributed Medical Data Storage and Retrieval. Electronics 2023, 12, 987. https://doi.org/10.3390/electronics12040987
Zhu C, Liu Z, Zou B, Xiao Y, Zeng M, Wang H, Fan Z. An HBase-Based Optimization Model for Distributed Medical Data Storage and Retrieval. Electronics. 2023; 12(4):987. https://doi.org/10.3390/electronics12040987
Chicago/Turabian StyleZhu, Chengzhang, Zixi Liu, Beiji Zou, Yalong Xiao, Meng Zeng, Han Wang, and Ziang Fan. 2023. "An HBase-Based Optimization Model for Distributed Medical Data Storage and Retrieval" Electronics 12, no. 4: 987. https://doi.org/10.3390/electronics12040987
APA StyleZhu, C., Liu, Z., Zou, B., Xiao, Y., Zeng, M., Wang, H., & Fan, Z. (2023). An HBase-Based Optimization Model for Distributed Medical Data Storage and Retrieval. Electronics, 12(4), 987. https://doi.org/10.3390/electronics12040987