A Multi-Bit Quantization Low-Latency Voltage Sense Amplifier Applied in RRAM Computing-in-Memory Macro Circuits
Abstract
:1. Introduction
- (1)
- The introduction of multi-bit quantization technology allows for two-bit digital output in a single quantization cycle using two voltage references, improving upon the inefficiency of conventional VSA’s one-bit-per-cycle mode. This reduces the number of operational states and thereby decreases readout latency;
- (2)
- The use of simple combinational logic circuits for two-bit quantization avoids the complexity of comparators, registers, and other circuits found in conventional VSA, simplifying the design and reducing the complexity of sequential control signals, further optimizing readout latency performance.
2. Architecture of CIM Macro Based on RRAM Arrays
2.1. Composition of the Architecture
- (1)
- The RRAM array stores the convolution kernel’s weight information (W[k:0]) and reads the stored weights via the WL drive module;
- (2)
- The peripheral circuit is responsible for performing subsequent multiply-accumulate (MAC) operations. Under the control of sequential signals generated by the CIM Control module, the analog output from the RRAM array is quantized by the VSA circuit, eventually outputting the digital result of multiple multiplication calculations (). Thus, the readout latency of the VSA significantly impacts the system’s overall performance.
2.2. Architectural Computation Sequence
3. The Proposed MQL-VSA
- (1)
- The sampling structure, consisting of four sampling switches (SW3–SW6) and two sampling capacitors (C0, C1), samples the voltage values of VREFL, VREFH, and VSUM (VREFL = 1/4 VDD, VREFH = 3/4 VDD, VSUM is the input voltage that is quantized);
- (2)
- The LSB sensing includes three MOS transistors NO (N1), P0 (P1), P2 (P3), a switch SW1 (SW2), and an inverter, generating the OUT2 (OUT2B) signal for the LSB-detecting circuit;
- (3)
- The latch comprises two cascaded inverters formed by N2, P4 and N3, P5 along with two switching MOS transistors (N4, P6), producing SAOUT[1] by comparing the voltages at nodes Q1B and Q1;
- (4)
- The LSB-detecting circuit, a 2-to-1 selector, selects between OUT2 and not (OUT2B) based on the value of SAOUT[1] to determine the SAOUT[0] result.
3.1. Architectural Computation Sequence
3.2. Workflow of the MQL-VSA
- (1)
- PH1. This phase carries out the sampling of VSUM, VREFL, and VREFH. As shown in Figure 6a, SW3 and SW4 are on to pass VSUM to X1 and VREFL to X2, respectively. Concurrently, SW1 and SW2 are on, distributing the charge from nodes Q1B, Q1 to nodes Q2B, Q2, resulting in all four nodes attaining a voltage of 1/2 VDD;
- (2)
- PH2. This phase achieves voltage differencing and coupling. Illustrated in Figure 6b, SW5 and SW6 are on, resulting in a voltage swing of at X2and of atX1. Since the voltages across capacitors C0 and C1 cannot change abruptly, , . Concurrently, SW1 and SW2 are turned off, and N0, P0 and N1, P1 formed inverter structures, causing voltage swings at nodes Q2B and Q2 opposite to those at Q1B and Q1, respectively;
- (3)
- PH3. This phase accomplishes the output of SAOUT[0] and SAOUT[1]. As Figure 6c illustrates, with SAEN1 = 1, the voltages at Q2 and Q2B are processed through inverters to generate OUT2 and OUT2B, respectively. Then, with SAEN2 = 1, the latch is activated to compare the voltages at nodes Q1andQ1B, determining the relationships of and and outputs the MSB of SAOUT[1:0]. Simultaneously, the 2-to-1 selector activates, outputting SAOUT[0] as the when SAOUT[1] = 1 and as the OUT2 when SAOUT[1] = 0.
3.3. Sizing Methodology for MOS Transistors, Switches, and Capacitors in MQL-VSA
- (1)
- Latch module. The latch module in Figure 3 consists of inverters sequentially connected, formed by N3, P5 and N2, P4. Its equivalent circuit is illustrated in Figure 7. Due to the circuit’s symmetrical structure, the transconductance (gm), output impedance (Rout), and load capacitance (CL) of the two inverters are equal. Based on Kirchhoff’s Law and the analysis method for first-order circuit time domain responses [15], we can derive the following at nodes X and Y:
- (2)
- LSB-sensing module. As shown in Figure 3, the primary function of this module is to sense the voltage changes at node Q1 (Q1B). Thus, during the PH1 phase of the circuit, the trip-point voltage VTRIP of the inverter made up of N1, P1 (N0, P0) should equal the VQ1B voltage at that time, which is VTRIP = VQ1B = 1/2 VDD. Due to the different mobilities of N-channel and P-channel MOSFETs, the W/L of P1 (P0) should be about 4 to 5 times that of N1 (N0) to set the inverter’s trip-point voltage VTRIP to 1/2 VDD. P2 (P3) is used as a charging transistor, so the smallest size is adequate.
- (3)
- Sampling module. According to Figure 3, when the capacitors and switches are in operation, they can be equivalently analyzed as a low-pass filter. The equivalent circuit is shown in Figure 8, where Ron represents the on-resistance of the switch. Since the switches in Figure 3 are all transmission gate structures, Ron can be expressed as:
3.4. Advantages of the MQL-VSA in Readout Latency
4. Experiments Result and Comparison
4.1. Verification of MQL-VSA Functionality
- (1)
- Testing the functionality and readout latency performance of the MQL-VSA;
- (2)
- Evaluating the readout latency characteristics of the RRAM array-based CIM macro circuit which utilizes MQL-VSA. The macro circuit employs the CIM architecture shown in Figure 10, with parameters set to m = 8, k = 8, and n = 14, forming an 8-bit input, 8-bit weight, 14-bit output CIM macro. The RRAM array consists of TiN/HfO2/TiN-based binary memristors [18,19], with the memristor unit simulated using the MuHAM model [20,21,22]. The array size is 0.5-Mb (1024 rows × 512 columns).
- (1)
- In the first cycle, with VREFH = 1.35 V and VREFL = 0.45 V, the quantization results of MQL-VSA are SAOUT[1]=1 and SAOUT[0] = 1, respectively;
- (2)
- In the second cycle, with VREFH = 1.6875 V and VREFL = 1.4625 V, the quantization results are SAOUT[1] = 1 and SAOUT[0] = 1.
- (1)
- (2)
- To reduce overshoots in the analog signals of Figure 11 (including Q1, Q1B, Q2, Q2B, SAOUT[1]), a non-overlapping control signal for all switches was employed. For example, during the transition from PH1 to PH2 phase, the falling edge of SW12 and SW34 and the rising edge of SW56 are non-overlapping. This prevents simultaneous conduction of switches during state transitions, thereby only minor overshoots occur in the analog signals, which are clear and well-defined, and these overshoots do not affect the final quantization outcome.
- (1)
- In the PVT experiments, five MOS corners (ff, tt, ss, sf, fs), three capacitor corners (ff, tt, ss), three power supply voltages (1.44 V, 1.8 V, and 2.16 V, which represent a ±20% fluctuation in supply voltage), and three temperature environments (−40 °C, 27 °C, 125 °C) were selected, totaling 135 scenarios. In all 135 different PVT conditions, the MQL-VSA consistently produced accurate quantization results. For instance, with input voltages of 0.36 V, 0.99 V, and 1.7 V, the MQL-VSA output the binary codes 0010, 1001, and 1111, respectively, in each of the 135 PVT experiments, demonstrating robust PVT characteristics.
- (2)
- In the Monte Carlo simulations, the MQL-VSA underwent 200 tests, consistently yielding correct quantization results. For example, with input voltages of 0.36 V, 0.99 V, and 1.7 V, the MQL-VSA output the binary codes 0010, 1001, and 1111, respectively, in each of the 200 tests. This indicates that even in the presence of device mismatches, the MQL-VSA maintains its accurate quantization capabilities.
4.2. Performance Comparison of MQL-VSA and Conventional VSA
4.3. Comparison Based on CIM Architectures
4.4. Comparison with Previous SAs
5. Discussion
- (1)
- (2)
- Area: as seen from Figure 3, the MQL-VSA circuit uses simple components like inverters, switches, capacitors, and latch structures, avoiding the complex circuit elements like dynamic comparators and multiplexers found in conventional VSA. This simplicity also translates to a reduction in area usage;
- (3)
- Common mode input range: conventional VSA structures employing dynamic comparators are limited in their common-mode input level (which must exceed the threshold voltage of the comparator’s input pair). Thus, they require an additional common-mode level input to function correctly. However, MQL-VSA, utilizing a latch structure instead of dynamic comparators, has no such limitation in its input common-mode range, indirectly reducing the circuit’s power consumption.
6. Conclusions
- (1)
- Utilizes multi-bit quantization technology to reduce the number of operational states in each quantization cycle by quantizing 2-bit data within one cycle;
- (2)
- Employs combinational logic circuits and Latch structures for MSB and LSB detection, decreasing the complexity of sequential control signals and further optimizing readout latency while simplifying the circuit structure.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Hung, J.M.; Wen, T.H.; Huang, Y.H.; Huang, S.P.; Chang, F.C.; Su, C.I.; Khwa, W.S.; Lo, C.C.; Liu, R.S.; Hsieh, C.C.; et al. 8-b Precision 8-Mb ReRAM Compute-in-Memory Macro Using Direct-Current-Free Time-Domain Readout Scheme for AI Edge Devices. IEEE J. Solid-State Circuits 2023, 58, 303–315. [Google Scholar] [CrossRef]
- Yoon, J.H.; Chang, M.; Khwa, W.S.; Chih, Y.D.; Chang, M.F.; Raychowdhury, A. A 40-nm, 64-Kb, 56.67 TOPS/W Voltage-Sensing Computing-In-Memory/Digital RRAM Macro Supporting Iterative Write with Verification and Online Read-Disturb Detection. IEEE J. Solid-State Circuits 2022, 57, 68–79. [Google Scholar] [CrossRef]
- Yoon, J.H.; Chang, M.; Khwa, W.S.; Chih, Y.D.; Chang, M.F.; Raychowdhury, A. A 40-nm 118.44-TOPS/W Voltage-Sensing Compute-in-Memory RRAM Macro with Write Verification and Multi-Bit Encoding. IEEE J. Solid-State Circuits 2022, 57, 845–857. [Google Scholar] [CrossRef]
- Zhang, F.; He, W.; Yeo, I.; Liehr, M.; Cady, N.; Cao, Y.; Seo, J.S.; Fan, D. A 65nm RRAM Compute-in-Memory Macro for Genome Sequencing Alignment. In Proceedings of the ESSCIRC 2023 IEEE 49th European Solid State Circuits Conference (ESSCIRC), Lisbon, Portugal, 11–14 September 2023; pp. 117–120. [Google Scholar] [CrossRef]
- He, Y.; Yue, J.; Feng, X.; Huang, Y.; Jia, H.; Wang, J.; Zhang, L.; Sun, W.; Yang, H.; Liu, Y. An RRAM-Based Digital Computing-in-Memory Macro With Dynamic Voltage Sense Amplifier and Sparse-Aware Approximate Adder Tree. IEEE Trans. Circuits Syst. II Express Briefs 2023, 70, 416–420. [Google Scholar] [CrossRef]
- Chang, M.F.; Wu, J.J.; Chien, T.F.; Liu, Y.C.; Yang, T.C.; Shen, W.C.; King, Y.C.; Lin, C.J.; Lin, K.F.; Chih, Y.D.; et al. 19.4 embedded 1Mb ReRAM in 28nm CMOS with 0.27-to-1V read using swing-sample-and-couple sense amplifier and self-boost-write-termination scheme. In Proceedings of the 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), San Francisco, CA, USA, 9–13 February 2014; pp. 332–333. [Google Scholar] [CrossRef]
- Husain, F.; Iqbal, B.; Grover, A. A 0.4µA Offset, 6ns Sensing-time Multi-level Sense Amplifier for Resistive Non-Volatile Memories in 65nm LSTP Technology. In Proceedings of the 2021 34th International Conference on VLSI Design and 2021 20th International Conference on Embedded Systems (VLSID), Guwahati, India, 20–24 February 2021; pp. 76–81. [Google Scholar] [CrossRef]
- Ye, W.; Wang, L.; Zhou, Z.; An, J.; Li, W.; Gao, H.; Li, Z.; Yue, J.; Hu, H.; Xu, X.; et al. A 28-nm RRAM Computing-in-Memory Macro Using Weighted Hybrid 2T1R Cell Array and Reference Subtracting Sense Amplifier for AI Edge Inference. IEEE J. Solid-State Circuits 2023, 58, 2839–2850. [Google Scholar] [CrossRef]
- An, B.K.; Zhang, X.; Do, A.T.; Kim, T.T.H. Design of a Current Sense Amplifier with Dynamic Reference for Reliable Resistive Memory. In Proceedings of the 2023 21st IEEE Interregional NEWCAS Conference (NEWCAS), Edinburgh, UK, 26–28 June 2023; pp. 1–5. [Google Scholar] [CrossRef]
- Zhang, H.; Lu, L. A Low-Voltage Sense Amplifier for Embedded Flash Memories. IEEE Trans. Circuits Syst. II Express Briefs 2015, 62, 236–240. [Google Scholar] [CrossRef]
- Xue, C.X.; Huang, T.Y.; Liu, J.S.; Chang, T.W.; Kao, H.Y.; Wang, J.H.; Liu, T.W.; Wei, S.Y.; Huang, S.P.; Wei, W.C.; et al. 15.4 A 22nm 2Mb ReRAM Compute-in-Memory Macro with 121-28TOPS/W for Multibit MAC Computing for Tiny AI Edge Devices. In Proceedings of the 2020 IEEE International Solid-State Circuits Conference—(ISSCC), San Francisco, CA, USA, 16–20 February 2020; pp. 244–246. [Google Scholar] [CrossRef]
- Zhang, W.; Yao, P.; Gao, B.; Liu, Q.; Wu, D.; Zhang, Q.; Li, Y.; Qin, Q.; Li, J.; Zhu, Z.; et al. Edge learning using a fully integrated neuro-inspired memristor chip. Science 2023, 381, 1205–1211. [Google Scholar] [CrossRef] [PubMed]
- Hung, J.-M.; Xue, C.-X.; Kao, H.-Y.; Huang, Y.-H.; Chang, F.-C.; Huang, S.-P.; Liu, T.-W.; Jhang, C.-J.; Su, C.-I.; Khwa, W.-S.; et al. A four-megabit compute-in-memory macro with eight-bit precision based on CMOS and resistive random-access memory for AI edge devices. Nat. Electron. 2021, 4, 921–930. [Google Scholar] [CrossRef]
- Mohammad, B.; Dadabhoy, P.; Lin, K.; Bassett, P. Comparative study of current mode and voltage mode sense amplifier used for 28nm SRAM. In Proceedings of the 2012 24th International Conference on Microelectronics (ICM), Algiers, Algeria, 16–20 December 2012; pp. 1–6. [Google Scholar] [CrossRef]
- Liu, W.; Mi-Chang, C. Transistor transient studies including transcapacitive current and distributive gate resistance for inverter circuits. IEEE Trans. Circuits Syst. I Fundam. Theory Appl. 1998, 45, 416–422. [Google Scholar] [CrossRef]
- Zhang, J.; Ren, X.; Liu, S.; Chan, C.H.; Zhu, Z. An 11-bit 100-MS/s Pipelined-SAR ADC Reusing PVT-Stabilized Dynamic Comparator in 65-nm CMOS. IEEE Trans. Circuits Syst. II Express Briefs 2020, 67, 1174–1178. [Google Scholar] [CrossRef]
- Bagali, V.; Jiddi, V.; Mujawar, R. PMOS Biased Sense Amplifier Using Cadence Virtuoso With gpdk 90nm Technology. In Proceedings of the 2022 IEEE North Karnataka Subsection Flagship International Conference (NKCon), Vijaypur, India, 20–21 November 2022; pp. 1–5. [Google Scholar] [CrossRef]
- Zhao, L.; Chen, Z.; Manea, D.; Li, S.; Li, J.; Zhu, Y.; Sui, Z.; Lu, Z. Highly Reliable 40nm Embedded Dual-Interface-Switching RRAM Technology for Display Driver IC Applications. In Proceedings of the 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), Honolulu, HI, USA, 12–17 June 2022; pp. 316–317. [Google Scholar] [CrossRef]
- Hu, W.; Luo, H.; Chen, C.; Wei, R. Analytic Modeling for RRAM Based on Multistage Homotopy Analysis Method. IEEE Trans. Nanotechnol. 2020, 19, 179–191. [Google Scholar] [CrossRef]
- Dai, G.; Du, X.; Xie, W.; Ni, T.; Han, M.; Wu, D. Circuit Design of 3- and 4-Bit Flash Analog-to-Digital Converters Based on Memristors. Electronics 2023, 12, 4069. [Google Scholar] [CrossRef]
- Darwish, H.; Reig, C.; Leger, G. Implementation of Background Calibration for Redundant FLASH ADC. Electronics 2023, 12, 4559. [Google Scholar] [CrossRef]
- Tang, J.; Wang, Y.; Gu, H.; Zou, W. Design of Self-Calibration Comparator for 12-Bit SAR ADCs. Electronics 2023, 12, 2277. [Google Scholar] [CrossRef]
MOS Transistor/Switch/Capacitor | Size |
---|---|
N0, N1 | W = 220 nm, L = 180 nm |
N2–N4 | W = 880 nm, L = 180 nm |
P0, P1 | W = 970 nm, L = 180 nm |
P2, P3 | W = 220 nm, L = 180 nm |
P4–P6 | W = 880 nm, L = 180 nm |
SW1–SW6 | PMOS: W = 7.5 m, L = 180 nm NMOS: W = 2.5 m, L = 180 nm |
C0 C1 | W = 10 m, L = 10 m Capacitance = 197.5 fF |
[7] | [8] | [9] | [10] | This Work | |
---|---|---|---|---|---|
Technology node (nm) | 65 | 28 | 40 | 130 | 180 |
Memory | N/A | RRAM | RRAM | Flash | RRAM |
Supply voltage (V) | 1 | 0.8 | 1.1 | 1.2 | 1.8 |
Sensing approach | Current | Current | Current | Voltage | Voltage |
Power (W) | 59.38 | 7.132 | 48.44 | 90.42 | 70.64 |
Readout latency (ns) | 30 | 26.4 | 14 | 70 | 50 |
Quantify bit/cycle | 1 | 1 | 1 | 1 | 2 |
FoM * | 3.65 | 14.87 | 5.90 | 2.05 | 10.19 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hu, W.; Zhang, H.; Wei, R.; Chen, Q. A Multi-Bit Quantization Low-Latency Voltage Sense Amplifier Applied in RRAM Computing-in-Memory Macro Circuits. Electronics 2024, 13, 356. https://doi.org/10.3390/electronics13020356
Hu W, Zhang H, Wei R, Chen Q. A Multi-Bit Quantization Low-Latency Voltage Sense Amplifier Applied in RRAM Computing-in-Memory Macro Circuits. Electronics. 2024; 13(2):356. https://doi.org/10.3390/electronics13020356
Chicago/Turabian StyleHu, Wei, Hangze Zhang, Rongshan Wei, and Qunchao Chen. 2024. "A Multi-Bit Quantization Low-Latency Voltage Sense Amplifier Applied in RRAM Computing-in-Memory Macro Circuits" Electronics 13, no. 2: 356. https://doi.org/10.3390/electronics13020356
APA StyleHu, W., Zhang, H., Wei, R., & Chen, Q. (2024). A Multi-Bit Quantization Low-Latency Voltage Sense Amplifier Applied in RRAM Computing-in-Memory Macro Circuits. Electronics, 13(2), 356. https://doi.org/10.3390/electronics13020356