TB-NUCA: A Temperature-Balanced 3D NUCA Based on Bayesian Optimization
Abstract
:1. Introduction
- We analyze the performance of 3D NoC with a uniform 3D NUCA mapping scheme and experimentally show that the topmost node’s temperature is the whole structure’s bottleneck;
- We design a non-uniform 3D NUCA mapping scheme that achieves better balanced thermal distribution in 3D chips;
- We propose an optimization objective with the normalized product of temperature and delay and introduce the Bayesian optimization algorithm to achieve this objective.
2. Related Work
2.1. 3D NUCA
2.2. Three-Dimensional Thermal Management Techniques
2.2.1. Thermal-Aware Memory Management
2.2.2. Thermal-Aware Task Scheduling
3. Preliminary
3.1. Baseline 3D NUCA Mapping Scheme
3.2. Bayesian Optimization Algorithm
Algorithm 1 Sequential Model-based Global Optimization. | |
Input: Expensive Function(f), Domain(X), Acquisition Function(S), Surrogate Model(M) | |
Output: Data Set (D) | |
1: Iintsamples | ▹ Initialize the data set |
2: for to T do | ▹ Set the number of parameter selection to T |
3: Fitmodel | ▹ Fitting current data set to obtain the predictive distribution |
4: | ▹ According to predictive distribution, find the extreme value point of S |
5: Observe | ▹ Expensive step |
6: | ▹ Update the date set |
7: end for |
4. Thermal-Balance Oriented Mapping Scheme
4.1. Bayesian Optimization Algorithm Design to Calculate the Cache Distribution of Each Bank
4.2. Hardware Design
5. Simulated Results
5.1. Simulated Setup
5.2. Analysis of Thermal Distribution and Mean-Time-To-Failure (MTTF)
5.3. Analysis of Traffic Load Distribution and Performances
5.4. Simulation on Benchmark
6. Discussions
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Cui, Y.; Prabhakar, S.; Zhao, H.; Mohanty, S.; Fang, J. A Low-cost Conflict-free NoC architecture for Heterogeneous Multicore Systems. In Proceedings of the 2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), Limassol, Cyprus, 6–8 July 2020; IEEE: Piscataway, NJ, USA; pp. 300–305. [Google Scholar]
- Ma, S.; Jerger, N.E.; Wang, Z. DBAR: An efficient routing algorithm to support multiple concurrent applications in networks-on-chip. In Proceedings of the 2011 38th Annual International Symposium on Computer Architecture (ISCA), San Jose, CA, USA, 4–8 June 2011; pp. 413–424. [Google Scholar]
- Zheng, H.; Wang, K.; Louri, A. Adapt-noc: A flexible network-on-chip design for heterogeneous manycore architectures. In Proceedings of the 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA) IEEE, Seoul, Korea, 27 February–3 March 2021; pp. 723–735. [Google Scholar]
- Indragandhi, K.; Jawahar, P. Core Performance Based Packet Priority Router for NoC-Based Heterogeneous Multicore Processor. In Intelligent System Design; Springer: Berlin/Heidelberg, Germany, 2021; pp. 389–397. [Google Scholar]
- Wang, Z.; Chen, X.; Lu, Z.; Guo, Y. Cache access fairness in 3d mesh-based nuca. IEEE Access 2018, 6, 42984–42996. [Google Scholar] [CrossRef]
- Momeni, M.; Pozveh, A.J. An adaptive approximation method for traffic reduction in network on chip. In Proceedings of the 2020 6th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS) IEEE, Mashhad, Iran, 23–24 December 2020; pp. 1–5. [Google Scholar]
- Black, B.; Annavaram, M.; Brekelbaum, N.; DeVale, J.; Jiang, L.; Loh, G.H.; McCaule, D.; Morrow, P.; Nelson, D.W.; Pantuso, D.; et al. Die stacking (3D) microarchitecture. In Proceedings of the 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’06) IEEE, Orlando, FL, USA, 9–13 December 2006; pp. 469–479. [Google Scholar]
- Qian, Y.; Lu, Z.; Dou, W. From 2D to 3D NoCs: A case study on worst-case communication performance. In Proceedings of the 2009 IEEE/ACM International Conference on Computer-Aided Design-Digest of Technical Papers IEEE, San Jose, CA, USA, 2–5 November 2009; pp. 555–562. [Google Scholar]
- Jiang, X.; Lei, X.; Zeng, L.; Watanabe, T. Fully adaptive thermal–aware routing for runtime thermal management of 3D network–on–chip. In Proceedings of the International MultiConference of Engineers and Computer Scientists, Hong Kong, China, 16–18 March 2016; pp. 659–664. [Google Scholar]
- Jheng, K.Y.; Chao, C.H.; Wang, H.Y.; Wu, A.Y. Traffic-thermal mutual-coupling co-simulation platform for three-dimensional network-on-chip. In Proceedings of the 2010 International Symposium on VLSI Design, Automation and Test IEEE, Hsin Chu, Taiwan, 26–29 April 2010; pp. 135–138. [Google Scholar]
- Yeo, I.; Liu, C.C.; Kim, E.J. Predictive dynamic thermal management for multicore systems. In Proceedings of the 45th Annual Design Automation Conference, Anaheim, CA, USA, 9–13 June 2008; pp. 734–739. [Google Scholar]
- Shahabinejad, N.; Beitollahi, H. Q-thermal: A Q-learning-based thermal-aware routing algorithm for 3-D network on-chips. IEEE Trans. Components Packag. Manuf. Technol. 2020, 10, 1482–1490. [Google Scholar] [CrossRef]
- Lee, S.C.; Han, T.H. Q-function-based traffic-and thermal-aware adaptive routing for 3D network-on-chip. Electronics 2020, 9, 392. [Google Scholar] [CrossRef]
- Momeni, M.; Shahhoseini, H. Energy optimization in 3D networks-on-chip through dynamic voltage scaling technique. In Proceedings of the 2020 28th Iranian Conference on Electrical Engineering (ICEE) IEEE, Tabriz, Iran, 4–6 August 2020; pp. 1–4. [Google Scholar]
- Wang, H.; Peh, L.S.; Malik, S. Power-driven design of router microarchitectures in on-chip networks. In Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-36), San Diego, CA, USA, 3–5 December 2003; IEEE: Piscataway, NJ, USA, 2003; pp. 105–116. [Google Scholar]
- Hardavellas, N.; Ferdman, M.; Falsafi, B.; Ailamaki, A. Reactive NUCA: Near-optimal block placement and replication in distributed caches. In Proceedings of the 36th Annual International Symposium on Computer Architecture, Austin, TX, USA, 20–24 June 2009; pp. 184–195. [Google Scholar]
- Chen, G.; Li, F.; Son, S.W.; Kandemir, M. Application mapping for chip multiprocessors. In Proceedings of the 45th Annual Design Automation Conference, Anaheim, CA, USA, 9–13 June 2008; pp. 620–625. [Google Scholar]
- Wolf, M.E.; Lam, M.S. A data locality optimizing algorithm. In Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation, Toronto, ON, Canada, 24–28 June 1991; pp. 30–44. [Google Scholar]
- Bondhugula, U.; Baskaran, M.; Hartono, A.; Krishnamoorthy, S.; Ramanujam, J.; Rountev, A.; Sadayappan, P. Towards effective automatic parallelization for multicore systems. In Proceedings of the 2008 IEEE International Symposium on Parallel and Distributed Processing, Sydney, Australia, 14–18 April 2008; pp. 1–5. [Google Scholar]
- Kim, C.; Burger, D.; Keckler, S.W. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, CA, USA, 5–9 October 2002; pp. 211–222. [Google Scholar]
- Chishti, Z.; Powell, M.D.; Vijaykumar, T. Distance associativity for high-performance energy-efficient non-uniform cache architectures. In Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture, San Diego, CA, USA, 3–5 December 2003; MICRO-36. IEEE: Piscataway, NJ, USA, 2003; pp. 55–66. [Google Scholar]
- Beckmann, B.M.; Wood, D.A. Managing wire delay in large chip-multiprocessor caches. In Proceedings of the 37th International Symposium on Microarchitecture (MICRO-37’04) IEEE, Portland, OR, USA, 4–8 December 2004; pp. 319–330. [Google Scholar]
- Arora, A.; Harne, M.; Sultan, H.; Bagaria, A.; Sarangi, S.R. Fp-nuca: A fast noc layer for implementing large nuca caches. IEEE Trans. Parallel Distrib. Syst. 2014, 26, 2465–2478. [Google Scholar] [CrossRef]
- Lira, J.; Molina, C.; Gonz, A. Hk-nuca: Boosting data searches in dynamic non-uniform cache architectures for chip multiprocessors. In Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium IEEE, Anchorage, AK, USA, 16–20 May 2011; pp. 419–430. [Google Scholar]
- Vanapalli, K.; Kapoor, H.K.; Das, S. An efficient searching mechanism for dynamic NUCA in chip multiprocessors. In Proceedings of the 2015 19th International Symposium on VLSI Design and Test IEEE, Ahmedabad, India, 26–29 June 2015; pp. 1–5. [Google Scholar]
- Hu, J.; Marculescu, R. Energy-aware mapping for tile-based NoC architectures under performance constraints. In Proceedings of the 2003 Asia and South Pacific Design Automation Conference, Kitakyushu, Japan, 21–24 January 2003; pp. 233–239. [Google Scholar]
- Hung, W.; Addo-Quaye, C.; Theocharides, T.; Xie, Y.; Vijakrishnan, N.; Irwin, M.J. Thermal-aware IP virtualization and placement for networks-on-chip architecture. In Proceedings of the IEEE International Conference on Computer Design: VLSI in Computers and Processors 2004 ICCD, San Jose, CA, USA, 11–13 October 2004; pp. 430–437. [Google Scholar]
- Cong, J.; Wei, J.; Zhang, Y. A thermal-driven floorplanning algorithm for 3D ICs. In Proceedings of the IEEE/ACM International Conference on Computer Aided Design ICCAD-2004, San Jose, CA, USA, 7–11 November 2004; pp. 306–313. [Google Scholar]
- Beigi, M.V.; Memik, G. TAPAS: Temperature-aware adaptive placement for 3D stacked hybrid caches. In Proceedings of the Second International Symposium on Memory Systems, Alexandria, VA, USA, 3–6 October 2016; pp. 415–426. [Google Scholar]
- Jiang, X.; Lei, X.; Zeng, L.; Watanabe, T. High performance virtual channel based fully adaptive thermal-aware routing for 3D NoC. In Proceedings of the 2017 18th International Symposium on Quality Electronic Design (ISQED), Santa Clara, CA, USA, 14–15 March 2017; pp. 289–295. [Google Scholar]
- Yao, K.; Ye, Y.; Pasricha, S.; Xu, J. Thermal-sensitive design and power optimization for a 3D torus-based optical NoC. In Proceedings of the 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), Irvine, CA, USA, 13–16 November 2017; pp. 827–834. [Google Scholar]
- Chou, C.T.; Lin, Y.P.; Chiang, K.Y.; Chen, K.C. Dynamic buffer allocation for thermal-aware 3D network-on-chip systems. In Proceedings of the 2017 IEEE International Conference on Consumer Electronics-Taiwan (ICCE-TW), Taipei, Taiwan, 12–14 June 2017; pp. 65–66. [Google Scholar]
- Tsai, T.H.; Chen, Y.S. Thermal-aware real-time task scheduling for three-dimensional multicore chip. In Proceedings of the 27th Annual ACM Symposium on Applied Computing, Trento, Italy, 26–30 March 2012; pp. 1618–1624. [Google Scholar]
- Li, J.; Qiu, M.; Niu, J.W.; Yang, L.T.; Zhu, Y.; Ming, Z. Thermal-aware task scheduling in 3D chip multiprocessor with real-time constrained workloads. ACM Trans. Embed. Comput. Syst. (TECS) 2013, 12, 1–22. [Google Scholar] [CrossRef]
- Chaturvedi, V.; Singh, A.K.; Zhang, W.; Srikanthan, T. Thermal-aware task scheduling for peak temperature minimization under periodic constraint for 3D-MPSoCs. In Proceedings of the 2014 25nd IEEE International Symposium on Rapid System Prototyping, New Delhi, India, 16–17 October 2014; pp. 107–113. [Google Scholar]
- Zhao, D.; Homayoun, H.; Veidenbaum, A.V. Temperature aware thread migration in 3D architecture with stacked DRAM. In Proceedings of the International Symposium on Quality Electronic Design (ISQED), Santa Clara, CA, USA, 4–6 March 2013; pp. 80–87. [Google Scholar]
- Chaparro-Baquero, G.A.; Sha, S.; Homsi, S.; Wen, W.; Quan, G. Thermal-aware joint CPU and memory scheduling for hard real-time tasks on multicore 3D platforms. In Proceedings of the 2017 Eighth International Green and Sustainable Computing Conference (IGSC), Orlando, FL, USA, 23–25 October 2017; pp. 1–8. [Google Scholar]
- Kim, D.H.; Athikulwongse, K.; Healy, M.B.; Hossain, M.M.; Jung, M.; Khorosh, I.; Kumar, G.; Lee, Y.J.; Lewis, D.L.; Lin, T.W.; et al. Design and analysis of 3D-MAPS (3D massively parallel processor with stacked memory). IEEE Trans. Comput. 2013, 64, 112–125. [Google Scholar] [CrossRef]
- Wordeman, M.; Silberman, J.; Maier, G.; Scheuermann, M. A 3D system prototype of an eDRAM cache stacked over processor-like logic using through-silicon vias. In Proceedings of the 2012 IEEE International Solid-State Circuits Conference, San Francisco, CA, USA, 19–23 February 2012; pp. 186–187. [Google Scholar]
- Dreslinski, R.G.; Fick, D.; Giridhar, B.; Kim, G.; Seo, S.; Fojtik, M.; Satpathy, S.; Lee, Y.; Kim, D.; Liu, N.; et al. Centip3de: A 64-core, 3d stacked near-threshold system. IEEE Micro 2013, 33, 8–16. [Google Scholar] [CrossRef]
- Sahu, P.K.; Chattopadhyay, S. A survey on application mapping strategies for network-on-chip design. J. Syst. Archit. 2013, 59, 60–76. [Google Scholar] [CrossRef]
- Bergstra, J.; Bardenet, R.; Bengio, Y.; Kégl, B. Algorithms for hyper-parameter optimization. Adv. Neural Inf. Process. Syst. 2011, 24. [Google Scholar]
- Accellera Systems Initiative. SystemC, Version 2.3.3. Available online: https://github.com/accellera-official/systemc/releases/tag/2.3.3 (accessed on 4 September 2022).
- Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2623–2631. [Google Scholar]
- Bienia, C.; Kumar, S.; Singh, J.P.; Li, K. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, Toronto, ON, Canada, 25–29 October 2008; pp. 72–81. [Google Scholar]
Parameter | Value |
---|---|
Packet size | 8 flits |
Buffer size | 4 flits |
Simulation time | cycles |
Warm-up time | cycles |
Mesh size | |
Traffic pattern | random, transpose-2, shuffle |
Injection rate | 0.08 flits/cycle/node |
Ambient temperature | 45 C |
Routing algorithm | XYZ |
Uniform-Random | Transpose-2 | Shuffle | ||||
---|---|---|---|---|---|---|
NUCA | S-NUCA | TB-NUCA | S-NUCA | TB-NUCA | S-NUCA | TB-NUCA |
Max | 90.8444 | 86.3749 | 95.4394 | 92.5412 | 90.5907 | 86.2438 |
Avg. | 86.1727 | 82.3084 | 85.9377 | 82.505 | 83.9708 | 81.0254 |
S.D. | 2.7985 | 2.226 | 3.2783 | 3.233 | 3.1297 | 3.0536 |
Uniform-Random | Transpose-2 | Shuffle | ||||
---|---|---|---|---|---|---|
NUCA | S-NUCA | TB-NUCA | S-NUCA | TB-NUCA | S-NUCA | TB-NUCA |
Throughput (flits/cycle/node) | 0.0789 | 0.0794 | 0.0794 | 0.0788 | 0.0793 | 0.0789 |
Average Latency (cycles) | 9.669 | 9.8734 | 7.773 | 8.449 | 8.221 | 8.669 |
Total Power (J) | 0.0244 | 0.0242 | 0.0239 | 0.0238 | 0.0212 | 0.0213 |
Blackscholes | ||||
---|---|---|---|---|
Max Temp. | Avg. Temp. | S.D. Temp. | Hotspots pct. | |
S-NUCA | 133.636 | 70.0672 | 16.8398 | 14% |
TB-NUCA | 78.8563 | 58.2297 | 7.00026 | 0% |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, H.; Zhao, Y.; Chen, X.; Li, C.; Lu, J. TB-NUCA: A Temperature-Balanced 3D NUCA Based on Bayesian Optimization. Electronics 2022, 11, 2910. https://doi.org/10.3390/electronics11182910
Liu H, Zhao Y, Chen X, Li C, Lu J. TB-NUCA: A Temperature-Balanced 3D NUCA Based on Bayesian Optimization. Electronics. 2022; 11(18):2910. https://doi.org/10.3390/electronics11182910
Chicago/Turabian StyleLiu, Hanyan, Yunping Zhao, Xiaowen Chen, Chen Li, and Jianzhuang Lu. 2022. "TB-NUCA: A Temperature-Balanced 3D NUCA Based on Bayesian Optimization" Electronics 11, no. 18: 2910. https://doi.org/10.3390/electronics11182910
APA StyleLiu, H., Zhao, Y., Chen, X., Li, C., & Lu, J. (2022). TB-NUCA: A Temperature-Balanced 3D NUCA Based on Bayesian Optimization. Electronics, 11(18), 2910. https://doi.org/10.3390/electronics11182910