1. Introduction
The number of transistors that can be integrated into a single chip is still steadily increasing, as predicted by Moore’s law. The FinFET has powered the industry throughout the previous decade, and the vertically stacked gate-all-around nanosheets will power the industry for two more decades. Although CMOS analog/mixed-signal blocks usually represent a tiny fraction of the billions of transistors on-chip, they are the dominant players from the perspective of design time and design effort. This can be explained by noting that, while digital design automation has been the industry standard for decades, analog design automation is still lagging, and the majority of the analog/mixed-signal circuits are still hand-crafted at the transistor level [
1,
2,
3].
Traditionally, analog designers used the concept of overdrive voltage (
) to define the transistor bias point in the strong-inversion region. The overdrive voltage is based on the long-channel square-law MOSFET model. But as the technology minimum feature size has steadily scaled down, transistors have deviated from these simple models, and the designers resorted to lengthy iterations using simulation tools. Not only is this a tedious, time-consuming process but it also leads to sub-optimal designs. The overdrive voltage has also gradually been replaced with a simulation-based design knob, the drain saturation voltage (
). Since the transition from the triode region to the saturation region is a gradual process,
is an ill-defined parameter that mimics the legacy of
. With the increased importance of low-voltage and low-power design, more transistors are now biased in moderate and weak-inversion regions. Consequently, the overdrive voltage concept has lost its significance. Other design knobs that describe the MOSFET bias point across all operating regions, such as the gm/ID and the inversion coefficient, have been proposed [
4,
5].
Analog design automation efforts have been ongoing for more than 30 years. These efforts can generally be divided into two categories: knowledge-based approaches and simulation-based approaches [
1,
2,
3,
6]. In the knowledge-based approach, the designer tries to turn their design procedure into a computer program. This can be useful for “personal” design automation, where the designer automates some of their own time-consuming operations. However, it is difficult to generalize and lacks the accuracy of real simulation models. Moreover, it does not lead to optimal solutions. On the other hand, the simulation-based approach relies on invoking the simulator in an optimization loop. This has the advantages of accuracy and optimal solutions, but invoking SPICE in the loop results in long computation times. Moreover, the resultant point may not make sense from a designer’s perspective; thus, the optimization sampling process must be guided by detailed and careful constraints. Hybrid approaches and machine learning/artificial intelligence approaches have also been proposed, but they suffer from accuracy issues when surrogate models are used, or from long computation times when the simulator is invoked in the loop [
7,
8,
9,
10,
11].
The design of analog circuits using precomputed lookup tables (LUTs) is a promising approach that bridges the gap between the two distant islands of classical handcrafting and black-box simulation-based optimization [
12,
13,
14,
15,
16,
17,
18]. The precomputed LUTs are generated by a simulator to abstract the complexity of modern device models; thus, simulation accuracy is preserved. The device data in the LUTs can be manipulated in different scenarios to enable the seamless integration of different design methodologies, such as the gm/ID design methodology, which enables intuitive biasing of transistors across all inversion levels [
4,
5]. The LUTs can be used in a knowledge-based approach to address the accuracy problem at the transistor level. The accuracy problem at the circuit level can be addressed by LUT-based custom solvers in an optimization loop or in a design-space exploration setting. Compared to SPICE-in-the-loop approaches, this will solve the long computation time problem.
The precomputed LUTs are generated once for a given technology; thus, the overhead of the generation process (a few hours per device) is tolerable. However, the memory footprint of the LUTs, which can be up to a few GBs per device depending on the LUT accuracy, is a drawback that the user encounters with every usage. The memory capabilities of computing devices have significantly improved in recent decades, making the usage of large LUTs viable. However, it is still desirable to minimize the memory footprint of the LUTs, especially when designing a circuit that involves a large number of different device types. In this paper, an incomplete-grid LUT is proposed to reduce the MOSFET LUT memory footprint by up to 67%.
The rest of this paper is organized as follows.
Section 2 presents an overview of the MOSFET LUTs.
Section 3 describes the proposed incomplete-grid memory-reduction technique.
Section 4 presents the results and discussion.
Section 5 concludes this paper.
2. The MOSFET Lookup Table (LUT)
Figure 1 shows the testbench used to characterize the MOSFET and build the lookup tables (LUTs). An N-type MOSFET is used for illustration purposes, but the discussion applies to P-type MOSFETs as well. The MOSFET has five degrees of freedom (DoFs). The DoFs are divided into two groups: first, the three terminal voltages
,
, and
, and second, the sizing parameters, channel width
, and length
. To build an LUT that captures these five DoFs, the LUT would need to be 5D and would have a large size. Fortunately, the MOSFET parameters are directly proportional to the width regardless of the inversion level (weak/moderate/strong inversion) or the mode of operation (triode, pinch-off, velocity saturation). Thus, the LUT can be constructed for a single reference width
, and linear scaling can be used to calculate the MOSFET parameters at any other width. This linear scaling may incur errors due to stress and narrow-width effects, but these errors can be corrected using small auxiliary LUTs [
15].
In the context of the gm/ID design methodology, the DOF can be replaced with the gm/ID ratio in order to zoom in on the region of interest for analog design and, consequently, reduce the LUT size. However, the gm/ID vs. characteristics depend on other variables, such as , and the extreme values of are needed for analog/mixed-signal design (e.g., sizing a sampling switch). Thus, building the LUT using the full range of is necessary.
By exploiting the linear scaling property, the LUT is reduced to a 4D array, as depicted in
Figure 2. Each 4D array stores one of the MOSFET parameters, e.g., drain current, small signal parameters, capacitances, noise parameters, etc. The LUT generation process is automated using a computer program that generates the testbench netlists, parses the simulation results, and stores the data in the appropriate structure. The process can be applied to devices of different types and at multiple temperatures and process corners to fully characterize a technology node. The LUT generation process can take up to a few hours per device, but it is carried out only once for a given technology. Thus, it is a one-time sunk cost to generate the LUTs, which can then be used by multiple designers across different projects.
An interpolation operation is required when an off-grid point is queried. This represents an inherent accuracy limitation. Thus, building the LUTs involves a size–accuracy trade-off. The step size used for every DoF is the knob that controls this trade-off. Using a fine step size improves the accuracy but results in a large memory footprint. Although the smart interpolation techniques proposed in [
14] can relax this trade-off, the overall memory footprint is still significant, especially when designing a circuit that involves many device types across different corners.
4. Results and Discussion
The proposed incomplete-grid implementation for the LUT can result in significant savings in the LUT memory footprint. An overhead exists due to the offset array (
), but this overhead is negligible, especially when the length of the
grid vector is large. The offset array overhead, expressed as a percentage of the device LUT size, is given by
where
N is the number of LUTs, i.e., the number of parameters stored for a given device, e.g.,
,
,
, etc., and
C is the number of process and temperature corners. Practically, if it is assumed that
and the length of the
L and
grid vectors
, the overhead will be less than 0.1%. Note that the offset array is shared among all the LUTs of a given device because all the LUTs use the same grid vectors and the same constraints to filter the full grid and create the incomplete grid. It should be noted that the offset array does not add a performance overhead because it is precomputed during the LUT generation process and stored with the LUT structure.
Figure 10 compares the sizes of the full-grid LUT and the incomplete-grid LUT. The number of points of the three grid vectors
is assumed to be the same, i.e., a 3D square array. The figure clearly shows that the incomplete grid can result in significant memory savings. The percentage of memory saving can be calculated as
and is plotted in
Figure 11. As the number of grid vector points increases, the memory saving approaches 67%, which means reducing two-thirds of the full-grid LUT size.
The previous results assume that the grid vectors have a uniform step. Practically, the step may be variable, with a small step used at the beginning and then an increased step afterward. The justification for the variable step is that, for analog circuits, the devices are usually biased with low to moderate values of
; thus, more accuracy is needed in this region. Consider the case where the number of points used in the
range is double the number of points used in the
range. The memory savings in this case are plotted in
Figure 12. As expected, the memory savings are less than in the uniform step case, but a significant memory saving (around 50%) can still be achieved.
For rectangular arrays, the incomplete grid may suffer from data loss at the array edges. For example, consider the case shown in
Figure 13, where the number of points in the
vector is less than the number of points in the
vector. Assume it is necessary to query a point that has
and
. This point is valid because
. However, since the neighboring point from the right
has been removed, the interpolation process will fail to obtain the grid points that surround the query point. The same problem applies to the case of
and
but in the downward direction. A simple solution to this problem is to always add one extra point after the last valid point. This will reduce memory savings, especially when the grid vector is small. However, the memory footprint is problematic when the grid vectors are large, and in this case, the extra grid point will not add significant overhead.
It is worth noting that the advantage of using the incomplete-grid LUT is not only the memory savings. The device characterization testbenches can also utilize only the valid points; thus, the LUT generation time will be reduced by the same memory-saving factor. Another advantage is that the time to query data from the LUT will also be reduced due to the smaller LUT size.
In order to demonstrate a design example using the incomplete-grid LUTs, we consider the two-stage CMOS Miller amplifier shown in
Figure 14. The design space of the circuit is generated by substituting the device parameters from the LUTs into symbolic expressions for the circuit’s performance metrics. A database containing 100k design points is generated in 24 s using a standard machine with a quad-core processor and 8 GB of RAM.
Figure 15 illustrates the circuit design space showing the gain-bandwidth product (GBW) vs. the total bias current while applying constraints on the circuit’s DC gain and phase margin (PM). Such a design chart can be used to evaluate the feasibility limits of a given circuit in a given technology and to obtain the Pareto optimal fronts for the design performance metrics.