A System-Level Exploration of Binary Neural Network Accelerators with Monolithic 3D Based Compute-in-Memory SRAM
Abstract
:1. Introduction
2. Related Work
3. CiM_SRAM-Based BNN_Accel
3.1. CiM_SRAM Subarray
- Based on an input row address, the signal B turns on all the NB transistors in the selected row.
- Based on inputs from the outside of the subarray, each signal L/R turns on/off the transistor NL/NR.
- As the stored value (Q/QB) in each cell is passed to Cout, the XNOR computation is performed.
- The XNOR results of all the cells in the selected row are passed to the Popcount unit through BLs.
- XAC is completed as the Popcount unit computes the popcount.
3.2. Overall Structure of BNN_Accel
3.3. Data Mapping for Parallel Computations in BNN_Accel
4. Evaluation
4.1. Evaluation Methodology
- We modeled the impact of additional transistors and wires on latency and energy in our modified CACTI, based on the Horowitz equation [11].
- We modeled the latency and energy of MIVs by adopting the MIV specification from [8].
- We modeled the area of CiM_SRAMs reflecting the cell structures; CiM_SRAM (M3D_2L) and CiM_SRAM (M3D_4L) had the same cell area as the conventional 6T SRAM, while CiM_SRAM (2D) had 1.5× larger cell area than the 6T SRAM.
4.2. Evaluation Results
4.2.1. Latency and Energy of CiM_SRAM
- A read operation in the 6T SRAM uses a smaller number of transistors than an XAC operation in CiM_SRAM (2D), which leads to shorter latency.
- The 6T SRAM is composed of a smaller number of transistors/wires than CiM_SRAM (2D), which leads to lower parasitic capacitance.
- Thanks to the transistor-level M3D integration, CiM_SRAM (M3D_4L) had 33.8% smaller cell area than CiM_SRAM (2D).
- CiM_SRAM (M3D_4L) had shorter column-wise wires by adopting the M3D BLP.
- Thanks to both (1) and (2), CiM_SRAM (M3D_4L) had smaller subarrays, which eventually reduced the length of the routing wires between subarrays.
4.2.2. Execution Time and Energy of BNN_Accel
4.2.3. Peak Temperature of BNN_Accel
4.3. Discussion on Monolithic 3D Fabrication Cost
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Gong, Y.-H.; Kong, J.; Chung, S.W. Quantifying the Impact of Monolithic 3D (M3D) Integration on L1 Caches. IEEE Trans. Emerg. Top. Comput. 2019, 1. [Google Scholar] [CrossRef]
- Agrawal, A.; Jaiswal, A.; Roy, D.; Han, B.; Srinivasan, G.; Ankit, A.; Roy, K. Xcel-RAM: Accelerating Binary Neural Networks in High-Throughput SRAM Compute Arrays. IEEE Trans. Circuits Syst. I Regul. Pap. 2019, 66, 3064–3076. [Google Scholar] [CrossRef] [Green Version]
- Synopsys Design Compiler. Available online: https://www.synopsys.com/implementation-and-signoff/rtl-synthesis-test/design-compiler-graphical.html (accessed on 15 February 2021).
- Courbariaux, M.; Hubara, I.; Soudry, D.; El-Yaniv, R.; Bengio, Y. Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or −1. Available online: https://arxiv.org/pdf/1602.02830.pdf (accessed on 17 March 2016).
- Jiang, J.; Parto, K.; Cao, W.; Banerjee, K. Monolithic-3D Integration with 2D Materials: Toward Ultimate Vertically-Scaled 3D-ICs. In Proceedings of the 2018 IEEE SOI-3D-Subthreshold Microelectronics Technology Unified Conference (S3S), Burlingame, CA, USA, 15–18 October 2018; pp. 1–3. [Google Scholar]
- Jiang, L.; Zhao, B.; Yang, J.; Zhang, Y. Constructing Large and Fast On-Chip Cache for Mobile Processors with Multilevel Cell STT-MRAM Technology. ACM Trans. Des. Autom. Electron. Syst. 2015, 20, 1–24. [Google Scholar] [CrossRef]
- Jiang, Z.; Yin, S.; Seo, J.-S.; Seok, M. XNOR-SRAM: In-Bitcell Computing SRAM Macro based on Resistive Computing Mechanism. In Proceedings of the 2019 on Great Lakes Symposium on VLSI, Tysons Corner, VA, USA, 9–11 May 2019; pp. 417–422. [Google Scholar]
- Zhang, R.; Stan, M.R.; Skadron, K. Hotspot 6.0: Validation, Acceleration and Extension. Available online: https://www.cs.virginia.edu/~skadron/Papers/HotSpot60_TR.pdf (accessed on 26 August 2015).
- Shao, Y.S.; Xi, S.L.; Srinivasan, V.; Wei, G.-Y.; Brooks, D. Co-designing accelerators and SoC interfaces using gem5-Aladdin. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Taipei, Taiwan, 15–19 October 2016; pp. 1–12. [Google Scholar]
- Dey, S.; Guajardo, E.Z.; Basireddy, K.R.; Wang, X.; Singh, A.K.; McDonald-Maier, K. EdgeCoolingMode: An Agent Based Thermal Management Mechanism for DVFS Enabled Heterogeneous MPSoCs. In Proceedings of the 2019 32nd International Conference on VLSI Design and 2019 18th International Conference on Embedded Systems (VLSID), Delhi, India, 5–9 January 2019; pp. 19–24. [Google Scholar]
- Synopsys IC Compiler. Available online: https://www.synopsys.com/implementation-and-signoff/physical-implementation/ic-compiler.html (accessed on 15 February 2021).
- Kim, Y.G.; Kim, M.; Kong, J.; Chung, S.W. An Adaptive Thermal Management Framework for Heterogeneous Multi-Core Processors. IEEE Trans. Comput. 2020, 69, 894–906. [Google Scholar] [CrossRef]
- Hsueh, F.-K.; Chiu, H.-Y.; Shen, C.-H.; Shieh, J.-M.; Tang, Y.-T.; Yang, C.-C.; Chen, H.-C.; Huang, W.-H.; Chen, B.-Y.; Chen, K.-M.; et al. TSV-free FinFET-based Monolithic 3D+-IC with computing-in-memory SRAM cell for intelligent IoT devices. In Proceedings of the 2017 IEEE International Electron Devices Meeting (IEDM), San Francisco, CA, USA, 2–6 December 2017; pp. 12.6.1–12.6.4. [Google Scholar] [CrossRef]
- Giacomin, E.; Greenberg-Toledo, T.; Kvatinsky, S.; Gaillardon, P.-E. A Robust Digital RRAM-Based Convolutional Block for Low-Power Image Processing and Learning Applications. IEEE Trans. Circuits Syst. I Regul. Pap. 2018, 66, 643–654. [Google Scholar] [CrossRef]
- Yin, S.; Jiang, Z.; Kim, M.; Gupta, T.; Seok, M.; Seo, J.-S. Vesti: Energy-Efficient In-Memory Computing Accelerator for Deep Neural Networks. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2019, 28, 48–61. [Google Scholar] [CrossRef]
- Mittal, S. A Survey of ReRAM-Based Architectures for Processing-In-Memory and Neural Networks. Mach. Learn. Knowl. Extr. 2019, 1, 75–114. [Google Scholar] [CrossRef] [Green Version]
- Li, S.; Chen, K.; Ahn, J.H.; Brockman, J.B.; Jouppi, N.P. CACTI-P: Architecture-level modeling for SRAM-based structures with advanced leakage reduction techniques. In Proceedings of the 2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), San Jose, CA, USA, 7–10 November 2011; pp. 694–701. [Google Scholar]
- Liu, R.; Peng, X.; Sun, X.; Khwa, W.-S.; Si, X.; Chen, J.-J.; Li, J.-F.; Chang, M.-F.; Yu, S. Parallelizing SRAM Arrays with Customized Bit-Cell for Binary Neural Networks. In Proceedings of the 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 24–28 June 2018; pp. 1–6. [Google Scholar]
- Hsueh, F.-K.; Lee, C.-Y.; Xue, C.-X.; Shen, C.-H.; Shieh, J.-M.; Chen, B.-Y.; Chiu, Y.-C.; Chen, H.-C.; Kao, M.-H.; Huang, W.-H.; et al. Monolithic 3D SRAM-CIM Macro Fabricated with BEOL Gate-All-Around MOSFETs. In Proceedings of the 2019 IEEE International Electron Devices Meeting (IEDM), San Francisco, CA, USA, 7–11 December 2019; pp. 3.3.1–3.3.4. [Google Scholar]
- Or-Bach, Z.; Cronquist, B.; Wurman, Z.; Beinglass, I.; Henning, A.K. Modified ELTRAN® A game changer for Monolithic 3D. In Proceedings of the 2015 IEEE SOI-3D-Subthreshold Microelectronics Technology Unified Conference (S3S), Rohnert Park, CA, USA, 5–8 October 2015; pp. 1–3. [Google Scholar]
Popcount Unit | Input Buffer | Global Periphery Logic | |
---|---|---|---|
Latency | 0.43 ns | 0.28 ns | 0.64 ns |
Dynamic power | 2.77 mW | 35.88 mW | 3.07 mW |
Leakage power | 2.21 µW | 29.32 µW | 2.63 µW |
CiM_SRAM (2D) | CiM_SRAM (M3D_2L) | CiM_SRAM (M3D_4L) | |
---|---|---|---|
Latency | 1.000 | 0.893 (10.7% lower) | 0.634 (36.6% lower) |
Energy | 1.000 | 1.344 (34.4% higher) | 0.855 (14.5% lower) |
Area | 1.000 | 0.711 (28.9% smaller) | 0.375 (62.5% smaller) |
BNN_Accel (2D) | BNN_Accel (M3D_2L) | BNN_Accel (M3D_4L) | |
---|---|---|---|
Exec. time | 1.000 | 0.801 (19.9% faster) | 0.601 (39.9% faster) |
Energy | 1.000 | 1.133 (13.3% higher) | 0.768 (23.2% lower) |
Area | 1.000 | 0.759 (24.1% smaller) | 0.479 (53.1% smaller) |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Choi, J.H.; Gong, Y.-H.; Chung, S.W. A System-Level Exploration of Binary Neural Network Accelerators with Monolithic 3D Based Compute-in-Memory SRAM. Electronics 2021, 10, 623. https://doi.org/10.3390/electronics10050623
Choi JH, Gong Y-H, Chung SW. A System-Level Exploration of Binary Neural Network Accelerators with Monolithic 3D Based Compute-in-Memory SRAM. Electronics. 2021; 10(5):623. https://doi.org/10.3390/electronics10050623
Chicago/Turabian StyleChoi, Jeong Hwan, Young-Ho Gong, and Sung Woo Chung. 2021. "A System-Level Exploration of Binary Neural Network Accelerators with Monolithic 3D Based Compute-in-Memory SRAM" Electronics 10, no. 5: 623. https://doi.org/10.3390/electronics10050623
APA StyleChoi, J. H., Gong, Y.-H., & Chung, S. W. (2021). A System-Level Exploration of Binary Neural Network Accelerators with Monolithic 3D Based Compute-in-Memory SRAM. Electronics, 10(5), 623. https://doi.org/10.3390/electronics10050623