Hardware-Based Implementation of Algorithms for Data Replacement in Cache Memory of Processor Cores
Abstract
:1. Introduction
1.1. Motivation
1.2. State of the Art
- The study of replacement policies during the caching of instructions and data in the cache memory of the CPU;
- The study of replacement policies during the caching in the device with FPGA;
- Research on the adaptive associativity of a reconfigurable cache scheme;
- The study of replacement policies in content-oriented computer networks;
- Research on data caching schemes for wireless networks;
- The study of the efficiency of different methods of caching of key-value storage for Blockchain in FPGA;
- The study of a cache-based matrix technique using the Hadoop Distributed File System (HDFS), where the EC volume consists of a Reed–Solomon code with parameters of (6, 3).
1.3. Objectives and Structure
- To analyze the structures of synchronous digital automata for further synthesis of the minimal hardware implementations of PLRUt and PLRUm replacement policies;
- To develop a minimal logic model for further research in a simulation environment;
- To evaluate the parameters of hardware implementations from the point of view of the delay of the transient process with the use of logical elements of NAND basis;
- To evaluate the complexity measured in logical elements and the reliability of hardware implementations;
- To analyze the possibility of the adaptation of the processor core cache memory to each of the PLRUt and PLRUm policies depending on the selected priority characteristic.
2. Methodology and Stages of This Study
- A formal description of the structure of the synchronous digital automata followed by the synthesis of logical models of the circuits of PLRUt and PLRUm replacement algorithms, which are based on the theory of digital automata;
- The minimization of defined switching functions;
- An evaluation of gate complexity and the reliability of the circuits;
- The experimental implementation of the model in FPGA and an investigation of the parameters.
3. Hardware Implementation of PLRUt and PLRUm
3.1. Automata Model of Algorithm PLRUt
3.2. Synthesis of Hardware Implementation of PLRUt
3.3. Research of Hardware Implementation for Algorithm PLRUt
3.4. Automata Model of Hardware Implementation of PLRUm Algorithm
3.5. Synthesis of Hardware Implementation of PLRUm Algorithm with Ways of q = 4
3.6. Research of Hardware Implementation of PLRUm Algorithm with Ways q = 4
3.6.1. Evaluation of Latency and Complexity
3.6.2. Evaluation of Reliability
4. Summarized Results of Study of Hardware Implementation of PLRUm Algorithm with Ways of q = 8, q = 16, and q = 32
4.1. Research of Hardware Implementations of PLRUm Algorithm with Ways of q = 8, q = 16, and q = 32
4.1.1. Evaluation of Complexity and Reliability
4.1.2. Evaluation of Delay
4.2. Possible Limitations and Challenges of Hardware Implementation of Both Algorithms
5. Discussion
5.1. Comparative Analysis
5.2. Adaptation
5.3. Further Development
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Embedded Intel486 Processor Hardware Reference Manual; Intel Corporation: Santa Clara, CA, USA, 1997; p. 334.
- Brey, B.B. The Intel Microprocessors: 8086/8088, 80186/80188, 80286, 80386, 80486, Pentium, Pentium Pro Processor, Pentium II, Pentium III, Pentium 4, and Core2 with 64-Bit Extensions: Architecture, Programming, and Interfacing; Pearson Prentice Hall: Old Bridge, NJ, USA, 2009; p. 925. [Google Scholar]
- Adkins, A.; Ammeson, B.; Anouna, J.; Garside, T.; Hunker, L.; Mailand, S. Intel Core i7 Memory Hierarchy. Available online: http://web.cs.wpi.edu/~cs4515/d15/Protected/LecturesNotes_D15/Week3_TeamA_i7-Presentation.pdf (accessed on 23 May 2023).
- Clemente, J.A.; Gran, R.; Chocano, A.; del Prado, C.; Resano, J. Hardware Architectural Support for Caching Partitioned Reconfigurations in Reconfigurable Systems. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2016, 24, 530–543. [Google Scholar] [CrossRef]
- Omran, S.S.; Amory, I.A. Implementation of LRU Replacement Policy for Reconfigurable Cache Memory Using FPGA. In Proceedings of the 2018 International Conference on Advanced Science and Engineering (ICOASE), Kurdistan Region, Iraq, 9–11 October 2018; pp. 13–18. [Google Scholar]
- Inoue, H. Multi-step LRU: Simd-based Cache Replacement for Lower Overhead and Higher Precision. In Proceedings of the 2021 IEEE International Conference on Big Data (Big Data), Orlando, FL, USA, 15–18 December 2021; pp. 174–180. [Google Scholar]
- Wang, C.; Wang, J.; Wang, M. Cache Performance Research for Embedded Processors. Phys. Procedia 2012, 25, 1322–1328. [Google Scholar] [CrossRef]
- Shimizu, A.; Townley, D.; Joshi, M.; Ponomarev, D. EA-PLRU: Enclave-aware Cache Replacement. In Proceedings of the HASP ’19: Proceedings of the 8th International Workshop on Hardware and Architectural Support for Security and Privacy, Phoenix, AZ, USA, 23 June 2019. [Google Scholar]
- Abel, A.; Reineke, J. Reverse engineering of cache replacement policies in Intel microprocessors and their evaluation. In Proceedings of the 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS 2014), Monterey, CA, USA, 23–25 March 2014; IEEE Computer Society: Washington, DC, USA, 2014; pp. 141–142. [Google Scholar]
- Roque, J.V.; Lopes, J.D.; Véstias, M.P.; de Sousa, J.T. IOb-Cache: A High-Performance Configurable Open-Source Cache. Algorithms 2021, 14, 218. [Google Scholar] [CrossRef]
- Zhang, K.; Wang, Z.; Chen, Y.; Zhu, H.; Sun, X.-H. PAC-PLRU: A Cache Replacement Policy to Salvage Discarded Predictions from Hardware Prefetchers. In Proceedings of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Newport Beach, CA, USA, 23–26 May 2011; pp. 265–274. [Google Scholar]
- Lentz, M.; Franklin, M. Performance of Private Cache Replacement Policies for Multicore Processors. In Proceedings of the 4th International Conference on Computer Science, Engineering and Applications, Dubai, United Arab Emirates, 7–8 March 2014; Volume 4, pp. 1–7. [Google Scholar]
- Perez, W.J.H.; Sanchez, E.; Reorda, M.S.; Tonda, A.; Medina, J.V. Functional Test Generation for the Plru Replacement Mechanism of Embedded Cache Memories. In Proceedings of the 2011 12th Latin American Test Workshop (LATW), Beach of Porto de Galinhas, Brazil, 27–30 March 2011; pp. 1–6. [Google Scholar]
- Grund, D.; Reineke, J. Toward Precise PLRU Cache Analysis. In Proceedings of the 10th International Workshop on Worst-Case Execution Time Analysis, WCET 2010, Brussels, Belgium, 6 July 2010; Volume 15, pp. 23–35. [Google Scholar]
- Puidenko, V.; Kharchenko, V. The minimizating of logical scheme for implementation of pseudo LRU by inter-type transition in trigger structures. Radioel. Comp. Syst. 2020, 2, 33–47. [Google Scholar]
- Puidenko, V. Automaton model, device synthesis and adaptive substitution algorithm for cache memory. Radioel. Comp. Syst. 2020, 4, 68–78. [Google Scholar]
- Zhu, W.; Zeng, X. Decision Tree-Based Adaptive Reconfigurable Cache Scheme. Algorithms 2021, 14, 176. [Google Scholar] [CrossRef]
- Rashid, S.; Razak, S.A.; Ghaleb, F.A. IMU: A Content Replacement Policy for CCN, Based on Immature Content Selection. Appl. Sci. 2022, 12, 344. [Google Scholar] [CrossRef]
- Sheraz, M.; Shafique, S.; Imran, S.; Asif, M.; Ullah, R.; Ibrar, M.; Khan, J.; Wuttisittikulkij, L. A Reinforcement Learning Based Data Caching in Wireless Networks. Appl. Sci. 2022, 12, 5692. [Google Scholar] [CrossRef]
- Fang, J.; Kong, H.; Yang, H.; Xu, Y.; Cai, M. A Heterogeneity-Aware Replacement Policy for the Partitioned Cache on Asymmetric Multi-Core Architectures. Micromachines 2022, 13, 2014. [Google Scholar] [CrossRef] [PubMed]
- Knoben, P.; Alachiotis, N. Improving Performance of Hardware Accelerators by Optimizing Data Movement: A Bioinformatics Case Study. Electronics 2023, 12, 586. [Google Scholar] [CrossRef]
- Siddiqui, M.F.; Ali, F.; Javed, M.A.; Khan, M.B.; Saudagar, A.K.J.; Alkhathami, M.; Abul Hasanat, M.H. An FPGA-Based Performance Analysis of Hardware Caching Techniques for Blockchain Key-Value Database. Appl. Sci. 2023, 13, 4092. [Google Scholar] [CrossRef]
- Shin, D.-J.; Kim, J.-J. Cache-Based Matrix Technology for Efficient Write and Recovery in Erasure Coding Distributed File Systems. Symmetry 2023, 15, 872. [Google Scholar] [CrossRef]
- Puidenko, V.; Kharchenko, V. The Minimizating of Hardware for Implementation of Pseudo LRU Algorithm for Cache Memory. In Proceedings of the 2020 IEEE 11th International Conference on Dependable Systems, Services and Technologies (DESSERT), Kyiv, Ukraine, 14–18 May 2020; pp. 65–71. [Google Scholar]
- Perepelitsyn, A. Method of Creation of FPGA based Implementation of Artificial Intelligence as a Service. Radioel. Comp. Syst. 2023, 3, 27–36. [Google Scholar] [CrossRef]
- Perepelitsyn, A.; Kulanov, V.; Zarizenko, I. Method of QoS Evaluation of FPGA as a Service. Radioel. Comp. Syst. 2022, 4, 153–160. [Google Scholar] [CrossRef]
- Barkalov, A.; Titarenko, L.; Mazurkiewicz, M. Foundations of Embedded Systems; Studies in Systems, Decision and Control, V. 195; Springer: Cham, Switzerland, 2019; 167p. [Google Scholar]
The Direction of Research in the Reference | Number of Considered References | ||||||
[6,7,8,9,11,12,13,14,15,16,20,24] | [4,5,10,21] | [17] | [18] | [19,27] | [22,25,26] | [23] | |
Study of replacement policies during the caching of instructions and data in the cache memory of the CPU | ● | ||||||
Study of replacement policies during the caching in the device based on FPGA | ● | ||||||
Research on the adaptive associativity of a reconfigurable cache scheme of the CPU | ● | ||||||
Study of replacement policies in content-oriented computer networks | ● | ||||||
Study scheme for wireless networks for growing data traffic and preventing overload | ● | ||||||
Study of the efficiency of different methods of caching of key-value storage for Blockchain in FPGA | ● | ||||||
Study of a cache-based matrix technique using the HDFS, where the EC volume consists of a Reed–Solomon code with the parameters of (6, 3) | ● |
Num. of q | Components | Number (Logical Elements per Line) |
---|---|---|
q = 4 | Logical elements AND-OR-NOT | 4 |
q = 4 | Logical elements NAND | 4 |
q = 4 | Logical elements AND | 4 |
Total (q = 4): | 12 | |
q = 8 | Logical elements AND-OR-NOT | 6 |
q = 8 | Logical elements NAND | 6 |
q = 8 | Logical elements AND | 9 |
Total (q = 8): | 21 | |
q = 16 | Logical elements AND-OR-NOT | 8 |
q = 16 | Logical elements NAND | 8 |
q = 16 | Logical elements AND | 18 |
Total (q = 16): | 34 | |
q = 32 | Logical elements AND-OR-NOT | 10 |
q = 32 | Logical elements NAND | 10 |
q = 32 | Logical elements AND | 35 |
Total (q = 32): | 55 |
Num. of Ways, q | Lines, N | 2and | 3and | 4and | τdec | |
---|---|---|---|---|---|---|
4 | 2 | 1 | 0 | 0 | 1 | τ |
8 | 3 | 0 | 1 | 0 | 1 | τ |
16 | 4 | 0 | 0 | 1 | 1 | τ |
32 | 5 | 1 | 0 | 1 | 2 | 2τ |
64 | 6 | 2 | 0 | 1 | 3 | 2τ |
128 | 7 | 1 | 1 | 1 | 3 | 2τ |
256 | 8 | 1 | 0 | 2 | 3 | 2τ |
512 | 9 | 0 | 1 | 2 | 3 | 2τ |
1024 | 10 | 1 | 1 | 2 | 4 | 2τ |
2048 | 11 | 0 | 2 | 2 | 4 | 2τ |
Num. of Ways, q | Total Complexity in Gates per Line, CPLRUt | Complexity of Memory Elements | Complexity of Decoder of Ways of Memory | Synthesized and Predicted Additional Gate Complexity | |||
---|---|---|---|---|---|---|---|
Cmem | τmem | Cdec | τdec | Cadd_log | CPLRUt (%) | ||
4 | 12 | 8 | 3τ | 4 | τ | 0 | 0.00 |
8 | 21 | 12 | 3τ | 8 | τ | 1 | 4.76 |
16 | 34 | 16 | 3τ | 16 | τ | 2 | 5.88 |
32 | 55 | 20 | 3τ | 64 | 2τ | 3 | 5.46 |
64 | 220 | 24 | 3τ | 192 | 2τ | 4 | 1.82 |
128 | 417 | 28 | 3τ | 384 | 2τ | 5 | 1.20 |
256 | 806 | 32 | 3τ | 768 | 2τ | 6 | 0.74 |
512 | 1579 | 36 | 3τ | 1536 | 2τ | 7 | 0.05 |
1024 | 4144 | 40 | 3τ | 4096 | 2τ | 8 | 0.19 |
2048 | 8245 | 44 | 3τ | 8192 | 2τ | 9 | 0.11 |
Num. of Ways, q | Complexity in Gates per Line, CPLRUt | Reliability, P (t = 10,000) | Delay, τ |
---|---|---|---|
4 | 12 | 0.9981 | 4τ |
8 | 21 | 0.9792 | 4τ |
16 | 34 | 0.9665 | 4τ |
32 | 55 | 0.9465 | 5τ |
64 | 220 | 0.8025 | 5τ |
128 | 417 | 0.6590 | 5τ |
256 | 806 | 0.4467 | 5τ |
512 | 1579 | 0.2062 | 5τ |
1024 | 4144 | 0.0159 | 5τ |
2048 | 8245 | 0.0003 | 5τ |
Num. of q | Symbol | Components | Number of Gates (V) per Line |
---|---|---|---|
q = 4 | Cmem | Logical elements NAND with 3 inputs | 8 |
q = 4 | Logical elements AND with 4 inputs | 8 | |
q = 4 | Cadd_log | Logical elements AND with 2 inputs | 1 |
q = 4 | Cdec | Logical elements AND with 2 inputs | 1 |
q = 4 | Logical elements AND with 3 inputs | 1 | |
q = 4 | Logical elements AND with 4 inputs | 1 | |
Ctotal (q = 4) | V = 20 |
Num. of q | Symbol | Components | Number of Gates (V) per Line |
---|---|---|---|
q = 8 | Cmem | Logical elements NAND with 3 inputs | 16 |
q = 8 | Logical elements AND with 4 inputs | 16 | |
q = 8 | Cdec | Logical elements AND with 2 inputs | 6 |
q = 8 | Logical elements AND with 3 inputs | 2 | |
q = 8 | Logical elements AND with 4 inputs | 3 | |
Ctotal (q = 8) | V = 43 | ||
q = 16 | Cmem | Logical elements NAND with 3 inputs | 32 |
q = 16 | Logical elements AND with 4 inputs | 32 | |
q = 16 | Cdec | Logical elements AND with 2 inputs | 8 |
q = 16 | Logical elements AND with 3 inputs | 8 | |
q = 16 | Logical elements AND with 4 inputs | 11 | |
Ctotal (q = 16) | V = 91 |
Num. of q | Symbol | Components | Number of Gates (V) per Line |
---|---|---|---|
q = 32 | Cmem | Logical elements NAND with 3 inputs | 64 |
q = 32 | Logical elements AND with 4 inputs | 64 | |
q = 32 | Cdec | Logical elements AND with 2 inputs | 13 |
q = 32 | Logical elements AND with 3 inputs | 18 | |
q = 32 | Logical elements AND with 4 inputs | 44 | |
Ctotal (q = 32) | V = 203 |
Policy | Ways | Single Cache Line | LUT | LUT (Table 3) [10] | FF | FF (Table 3) [10] |
---|---|---|---|---|---|---|
PLRUm | 4 | 1 | 8 | 8 | 4 | 4 |
PLRUt | 4 | 1 | 6 | 6 | 2 | 3 |
PLRUm | 8 | 1 | 16 | 22 | 8 | 8 |
PLRUt | 8 | 1 | 11 | 13 | 3 | 7 |
PLRUm | 16 | 1 | 32 | – | 16 | – |
PLRUt | 16 | 1 | 20 | – | 4 | – |
PLRUm | 32 | 1 | 64 | – | 32 | – |
PLRUt | 32 | 1 | 37 | – | 5 | – |
Policy | Ways | Single Cache Line | LUT | LUT (Table 3) [10] | FF | FF (Table 3) [10] |
---|---|---|---|---|---|---|
PLRUt | 64 | 1 | 70 | – | 6 | – |
PLRUt | 128 | 1 | 135 | – | 7 | – |
PLRUt | 256 | 1 | 264 | – | 8 | – |
PLRUt | 512 | 1 | 521 | – | 9 | – |
PLRUt | 1024 | 1 | 1034 | – | 10 | – |
PLRUt | 2048 | 1 | 2059 | – | 11 | – |
Num. of Ways, q | Gate Complexity, CPLRUt | Gate Complexity, CPLRUm | Reliability, P (t = 10,000) PLRUt | Reliability, P (t = 10,000) PLRUm | Delay, τ PLRUt | Delay, τ PLRUm |
---|---|---|---|---|---|---|
4 | 12 | 20 | 0.9981 | 0.9802 | 4τ | 4τ |
8 | 21 | 43 | 0.9792 | 0.9579 | 4τ | 5τ |
16 | 34 | 91 | 0.9665 | 0.9130 | 4τ | 5τ |
32 | 55 | 203 | 0.9465 | 0.8163 | 5τ | 6τ |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Titarenko, L.; Kharchenko, V.; Puidenko, V.; Perepelitsyn, A.; Barkalov, A. Hardware-Based Implementation of Algorithms for Data Replacement in Cache Memory of Processor Cores. Computers 2024, 13, 166. https://doi.org/10.3390/computers13070166
Titarenko L, Kharchenko V, Puidenko V, Perepelitsyn A, Barkalov A. Hardware-Based Implementation of Algorithms for Data Replacement in Cache Memory of Processor Cores. Computers. 2024; 13(7):166. https://doi.org/10.3390/computers13070166
Chicago/Turabian StyleTitarenko, Larysa, Vyacheslav Kharchenko, Vadym Puidenko, Artem Perepelitsyn, and Alexander Barkalov. 2024. "Hardware-Based Implementation of Algorithms for Data Replacement in Cache Memory of Processor Cores" Computers 13, no. 7: 166. https://doi.org/10.3390/computers13070166
APA StyleTitarenko, L., Kharchenko, V., Puidenko, V., Perepelitsyn, A., & Barkalov, A. (2024). Hardware-Based Implementation of Algorithms for Data Replacement in Cache Memory of Processor Cores. Computers, 13(7), 166. https://doi.org/10.3390/computers13070166