Erasure Codes for Cold Data in Distributed Storage Systems
Abstract
:1. Introduction
- -
- We propose a storage schedule scheme combining the advantage of the erasure code strategy and the replica strategy to save the storage space that cold data use.
- -
- We propose the NewLib code scheme and the N-Schedule strategy to improve the encoding and decoding speed. At the same time, we successfully designed and implemented a node schedule to increase the data addressing performance.
2. Related Works
2.1. Research on Cloud Storage System Architecture
2.2. Decentralized Architecture
2.3. The Array Codes
2.4. Bit Matrix Coding
3. NewLib Code
3.1. Encoding
3.1.1. Encoding in the Liberation Code Pattern
3.1.2. The Aligned Request
3.1.3. The Non-Aligned Request
3.2. Decoding in the N-Schedule Scheme
4. Implementation
4.1. Architecture
4.2. Node Schedule
5. Evaluation
5.1. Experimental Setup
5.2. Read Performance Test
5.2.1. The Performance in Sequential Reading
5.2.2. The Performance in Random Reading
5.3. Write Performance Test
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Schermann, M.; Hemsen, H.; Buchmuller, C.; Bitter, T.; Krcmar, H.; Markl, T.; Hoeren, T. Big Data. Bus. Inf. Syst. Eng. 2014, 6, 261–266. [Google Scholar] [CrossRef]
- Amit, S. Report Stating the Concept of Cloud Computing Testing and Its Challenges. Int. J. Recent Innov. Trends Comput. Commun. 2020, 1, 11–14. [Google Scholar] [CrossRef]
- Antony, C.; Jonathon, Q.N.; Nigel, M.; James, B.S. Big data and ophthalmic research. Surv. Ophthalmol. 2016, 4, 443–465. [Google Scholar]
- Alexandros, G.D.; Parke, G.; Martin, J.W.; Kannan, R. Network Coding for Distributed Storage Systems. IEEE Trans. Inf. Theory 2010, 9, 4539–4551. [Google Scholar]
- Daniel, F.; Francois, L.; Florentina, I.P.; Murray, S.; Nan-Anh, T.; Luiz, B.; Carrie, G.; Sean, Q. Availability in Globally Distributed Storage System. In Proceedings of the 9th Usenix Symposium on Operation Systems Design and Implementation (OSDI), Vancouver, BC, Canada, 4–6 October 2010; pp. 1–14. [Google Scholar]
- Mutschlechner, M.; Li, B.; Kapitza, R.; Dressler, F. Using Erasure Codes to overcome reliability issues in energy-constrained sensor networks. In Proceedings of the 11th Annual Conference on Wireless On-Demand Network Systems and Services (WONS), Obergurgl, Austria, 2–4 April 2014; pp. 41–48. [Google Scholar]
- Mohammad, K.; Daryoosh, D.; Hadi, H. A nano-FET structure comprised of inherent paralleled TFET and MOSFET with improved performance. Ain Shams Eng. J. 2020, 4, 1105–1112. [Google Scholar]
- Alasmar, M.; Parisis, G.; Crowcroft, J. Polyraptor: Embracing Path and Data Redundancy in Data Centres for Efficient Data Transport. In Proceedings of the ACM SIGCOMM 2018 Conference on Posters and Demos, New York, NY, USA, 20–25 August 2018; pp. 69–71. [Google Scholar]
- Chun, B.G.; Dabek, F.; Haeberlen, A.; Sit, E.; Weatherspoon, H.; Kaashoek, M.F.; Kubiatowicz, J.; Morris, R. Efficient replica maintenance for distributed storage systems. In Proceedings of the 3rd Symposium on Networked Systems Design and Implementation (NSDI), San Jose, CA, USA, 8–10 May 2006; pp. 45–58. [Google Scholar]
- Ghemawat, S.; Gobioff, H.; Leung, S. The Google File System. In Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles, Bolton Landing, NY, USA, 19–22 October 2003; pp. 29–43. [Google Scholar]
- Dean, J.; Ghemawat, S. MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM 2008, 51, 107–113. [Google Scholar] [CrossRef]
- Chang, F.; Dean, J.; Ghemawat, S.; Hsieh, W.C.; Wallach, D.A.; Burrows, M.; Chandra, T.; Fikes, A.; Gruber, R.E. Bigtable: A Distributed Storage System for Structure Data. ACM Trans. Comput. Syst. 2008, 26, 1–26. [Google Scholar] [CrossRef]
- Shvachko, K.; Kuang, H.; Radia, S.; Chansler, R. The Hadoop Distributed File System. In Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST2010), Washington, DC, USA, 3–7 May 2010; pp. 1–10. [Google Scholar]
- Prakash, N.; Abdrashitov, V.; Medard, M. The storage versus repair-bandwidth trade-off for clustered storage systems. IEEE Trans. Inf. Theory 2018, 8, 5783–5805. [Google Scholar] [CrossRef]
- Li, R.; Lin, J.; Lee, P.P.C. Enabling concurrent failure recovery for regenerating coding-based storage systems: From theory to practice. IEEE Trans. Comput. 2015, 64, 1898–1911. [Google Scholar] [CrossRef]
- Gu, Y.; Grossman, R.L. Sector and Sphere: The design and implication of a high-performance data cloud. Philos. Trans. R. Soc. 2009, 14, 2429–2445. [Google Scholar] [CrossRef] [PubMed]
- Li, J.; Dabek, F. F2F: Reliable storage in open networks. In Proceedings of the 5th International Workshop on Peer-to-Peer Systems (IPTPS 2006), Santa Barbara, CA, USA, 26–28 February 2006; pp. 1–6. [Google Scholar]
- Decandia, G.; Hastorun, D.; Jampani, M.; Kakulapati, G.; Lakshman, A.; Sivasubramanian, S.; Vosshall, P.; Vogels, W. Dynamo: Amazon’s Highly Available Key-value Store. In Proceedings of the 21st ACM Symposium on Operating Systems Principles (SOSP’07), Stevenson, WA, USA, 14–17 October 2007; pp. 205–220. [Google Scholar]
- Calder, B.; Wang, J.; Ogus, A.; Nilakantan, N. Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP’11), Cascais, Portugal, 23–26 October 2011; pp. 143–157. [Google Scholar]
- Hu, Y.; Wang, Y.; Liu, B.; Niu, D.; Huang, C. Latency reduction and load balancing in coded storage systems. In Proceedings of the ACM Symposium on Cloud Computing, Santa Clara, CA, USA, 24–27 September 2017; pp. 365–377. [Google Scholar]
- Rodrigues, R.; Liskov, B. High Availability in DHTs: Erasure Coding vs. Replication. In Proceedings of the 4th International Workshop on Peer-to-Peer Systems (IPTPS), Ithaca, NY, USA, 24–25 February 2015; pp. 226–239. [Google Scholar]
- Blaum, M.; Brady, J.; Bruck, J.; Menon, J. EVENODD: An efficient scheme for tolerating double disk failures in RAID architectures. IEEE Trans. Comput. 1995, 44, 192–202. [Google Scholar] [CrossRef]
- Peter, C.; Bob, E.; Atul, G.; Tomislav, G.; Steve, R.K.; James, L.; Sunitha, S. Row-diagonal parity for double disk failure correction. In Proceedings of the 3rd USENIX Conference on File and Storage Technologies (FAST’04), Berkeley, CA, USA, 31 March–2 April 2004; pp. 1–14. [Google Scholar]
- Xiang, L.; Xu, Y.; John, C.S.L.; Chang, Q. Optimal recovery of single disk failure in RDP code storage systems. ACM SIGMETRICS Perform. Eval. Rev. 2010, 38, 119–130. [Google Scholar] [CrossRef]
- James, S.P. The RAID-6 Liberation Codes. In Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST’05), Berkeley, CA, USA, 26–29 February 2008; pp. 1–14. [Google Scholar]
- Yin, C.; Lv, H.; Li, T.; Liu, Y.; Qu, X.; Yuan, S. P-Schedule: Erasure Coding Schedule Strategy in Big Data Storage System. In Proceedings of the 18th International Conference on Algorithms and Architectures for Parallel Processing, Guangzhou, China, 15–17 November 2018; pp. 270–279. [Google Scholar]
- Huang, C.; Xu, L. STAR: An efficient coding scheme for correcting triple storage node failures. IEEE Trans. Comput. 2008, 57, 197–210. [Google Scholar] [CrossRef]
- Hafner, J.L. Weaver codes: Highly fault tolerant erasure codes for storage systems. In Proceedings of the 4th USENIX Conference on File and Storage Technologies (FAST’05), Francisco, CA, USA, 13–16 December 2005; pp. 211–224. [Google Scholar]
- Blömer, J.; Kalfane, M.; Karp, R.; Karpinski, M.; Luby, M.; Zuckerman, D. An XOR-Based Erasure-Resilient Coding Schem; Technical Report; International Computer Science Institute (ICSI): Berkeley, CA, USA, 1995; pp. 1–19. [Google Scholar]
- James, S.P. Optimizing Cauchy Reed-Solomon Codes for Fault-Tolerant Storage Applications. In Proceedings of the Fifth IEEE International Symposium on Network Computing and Applications (NCA’06), Cambridge, MA, USA, 24–26 July 2006. [Google Scholar]
- Moore, F.; Brady, W.J.; Telecky, F.J. Polynomial Codes Over Certain Finite Fields. J. Soc. Ind. Appl. Math. 1996, 8, 300–304. [Google Scholar]
- James, S.P. A tutorial on Reed-Solomon coding for fault-tolerance in RAID-like systems. Softw. Pract. Exp. 1997, 27, 995–1012. [Google Scholar]
- James, S.P.; Ying, D. Note: Correction to the 1997 tutorial on Reed-Solomon coding. Softw. Pract. Exp. 2005, 35, 189–194. [Google Scholar]
- Luo, J.; Kevin, D.B.; Alina, M.O.; Xu, L. Efficient software implementations of large finite fields GF(2n) for secure storage applications. ACM Trans. Storage 2012, 8, 1–27. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yin, C.; Xu, Z.; Li, W.; Li, T.; Yuan, S.; Liu, Y. Erasure Codes for Cold Data in Distributed Storage Systems. Appl. Sci. 2023, 13, 2170. https://doi.org/10.3390/app13042170
Yin C, Xu Z, Li W, Li T, Yuan S, Liu Y. Erasure Codes for Cold Data in Distributed Storage Systems. Applied Sciences. 2023; 13(4):2170. https://doi.org/10.3390/app13042170
Chicago/Turabian StyleYin, Chao, Zhiyuan Xu, Wei Li, Tongfang Li, Sihao Yuan, and Yan Liu. 2023. "Erasure Codes for Cold Data in Distributed Storage Systems" Applied Sciences 13, no. 4: 2170. https://doi.org/10.3390/app13042170
APA StyleYin, C., Xu, Z., Li, W., Li, T., Yuan, S., & Liu, Y. (2023). Erasure Codes for Cold Data in Distributed Storage Systems. Applied Sciences, 13(4), 2170. https://doi.org/10.3390/app13042170