1. Introduction
Active electronically scanned Array (AESA) antennas allow for controlling the beam shape and direction without physically moving the antenna since phase and amplitude of the signals driving each radiating element are independently set through a dedicated transmit-receive module (TRM), digitally controlled by a central processor.
The core-chip (CC) is a mixed-signal highly-integrated multi-function MMIC performing all the core controlling functions of a TRM, i.e., path setting, phase shifting, amplitude control, together with low-level amplification at microwave frequencies from S- to Ka-band [
1,
2,
3,
4], while the high-level amplification functionalities are typically entrusted to a second dedicated MMIC, the front-end (FE), nowadays typically implemented in GaN technology [
5,
6]. To achieve accurate beam control, the number of control lines required within a CC can be very high, in the order of tens, and may thus represent a major issue in large AESA systems. Embedding a clocked synchronized serial-input-parallel-output (SIPO) interface in the CC drastically reduces the number of external pins required to control multiple internal bits, as it can control all the high frequency analog parts (switches, phase shifters and attenuators), with just few input lines, as shown in
Figure 1. The large number of transistors required in logic circuits, make the SIPO interface one of the most critical blocks in a CC in terms of yield [
7]. Beyond yield, current consumption, which impacts on chip self-heating and reliability [
8], and chip area occupation are also crucial aspects to be optimized, together with digital noise margins, commutation speed and compatibility with the external logic levels, which are further key features that need to be taken into account for the design of the digital interface.
When dealing with digital electronics systems, the highest process maturity is provided by Silicon technology, well-established and worldwide adopted for all kinds of low/medium frequency digital applications. High-frequency Si-based processes, RF-CMOS and SiGe BiCMOS, provide easier implementation of digital circuits, thanks to the availability of complementary devices and a large number of metallization layers, and have advantages in terms of noise margins, cost and yield over GaAs-based technologies [
9,
10,
11,
12]. However, for the same device periphery, GaAs transistors can achieve higher commutation speed than Si ones and at a lower power [
9,
13,
14]. This makes GaAs an appealing alternative to Si for high-speed logic circuits, especially when considering Enhancement/Depletion-mode (E/D-mode) technologies offering much more flexibility than D-mode-only processes for digital design. Moreover, GaAs technology ensures radiation hardness, a key feature for space and military applications, and, most important, its RF performance, both in terms of noise figure and output power capability, is still sensibly higher than that of Si technologies, which makes it the preferable choice for mixed digital/RF chips, like microwave core-chips.
This paper discusses and compares GaAs SIPO architectures found in the literature, evaluating advantages and disadvantages of the different logic families and architectures adopted.
Section 2 describes the use of the SIPO interfaces in microwave core-chips, summarizing the state-of-the-art, while in the rest of the paper the different building blocks are detailed and discussed. After recalling the overall SIPO architecture in
Section 3, GaAs logic families are presented. Then interface and memory elements are illustrated in
Section 5 and
Section 6, respectively, while
Section 7 concludes and summarizes the work.
2. Literature Overview of SIPO Interfaces in GaAs Microwave Core-Chips
As shown in
Figure 1, a microwave core-chip always includes an
-bit variable attenuator and an
-bit variable phase shifter. The higher the number of control bits, the finer the tuning of the signal amplitude and phase, and hence the higher the accuracy of the beam control, but this also increases the number of signals required to control these circuits. Moreover, one or more switches are often included to share the circuitry between the transmit and/or receive path or to include/exclude amplification stages, requiring
additional control bits. The routing of the input control lines for each core-chip within an AESA system would rapidly become unsustainably complex increasing the number of bits, therefore embedding a serial-input-parallel-output interface represents a crucial benefit, where not a mandatory requirement, of microwave core-chips.
The state-of-the-art overview is reported in
Table 1. Along with the number of bits, the designation of the bits within the core-chip is reported, when available: clearly, most of the bits are used for the phase and amplitude control, while, depending on the chip functionalities, up to six bits [
15] are adopted for the switches. It is worth to note than in several cases [
16,
17,
18,
19] a serial output bit is also available, allowing to connect more CCs in daisy chain and configure them sequentially, and/or to test/monitor the SIPO output. In principle, it is sufficient to route the last bit to an output pad, possibly after level shifting/buffering as in [
17]. Even if practically negligible, this represents a slight alteration in the load of the last bit with respect to all other bits, therefore in [
18] an extra bit is included to be used specifically for the data output line. In [
20] the number of SIPO bits is double than necessary so as to simultaneously load two different control words, containing, respectively, the receive- and transmit-mode CC configuration and stored in the even and off SIPO bits. Twelve multiplexers are then used to selectively send to the attenuator and phase shifter the proper word according to the CC operating mode which is given though an additional external input. This solution is particularly interesting when the CC is expected to be operated continuously switching between the two operating modes without changing the beam characteristics.
The most typical logic family adopted is the direct coupled FET logic (DCFL), requiring Enhancement-mode (E-mode) transistors. As detailed in
Section 4, this solution ensures compactness, low device count and hence potentially high yield and low power consumption, especially in the E-mode-only version. The super buffered FET logic (SBFL) has been also proposed, which favors instead noise margins and achieves moderate speed, comparable to DCFL. The buffered FET logic (BFL) is the preferred choice when depletion-mode (D-mode) transistors are only available, or when they are deliberately exploited either to enhance speed or to allow higher RF operating frequencies for the analog portion of the CC [
28,
32]. Only two flip-flop architectures are present in the literature, which will be discussed in
Section 6, even if in many works such a detail is not available, as well as the clock speed. The latter indeed is not a stringent design constraint, and is thus typically in the range of tens of megahertz, except for [
32] where a simultaed bit rate of 3 Gbit/s is reported.
3. Circuit Architecture
The block diagram of an
n-bit SIPO interface is shown in
Figure 2. It is composed of an
n-bit shift register (SR) coupled with an
n-bit hold register (HR), typically completed by
n output buffers (OB), input level shifters (LS) for voltage level compatibility among different logic families and Schmitt trigger input buffers. The minimum number of inputs is 3, namely
data (
D),
clock (
C) and
load (
L) signals, as shown also in
Figure 1. The shift register is loaded one bit at a time (serial input) according to the clock signal. After
n clock cycles the entire CC configuration word is available (parallel output), and, upon reception of the load command, the HR samples in parallel all
n output bits of the SR and stores them until a new load command is received. Clearly, the minimum time between two consecutive CC re-configuration is
n clock cycles, but typically the frequency at which the CC needs to be re-configured is much lower, thus the HR remains most of the time in a hold state. Moreover, the HR does not interface with the external environment, but it is internally driven by the SR, i.e., by already synchronized signals, thus the probability of spurious commutations and/or synchronization issues is extremely low, which allows to simplify the HR architecture. This practically relaxes the switching speed and consumption requirements for this component with respect to the SR, which is thus the dominant sub-circuit in determining the overall SIPO interface performance. Finally, an array of output buffers takes care of adjusting the signals’ voltage level and driving capability.
Apart from the input interface, the rest of the SIPO circuit can be seen as
n replicas of a basic building blocks made by one bit of the SR, one bit of the HR and one output buffer. When designing such basic
bit-wise block, it is important to optimize the layout for compactness and modularity, to save costs and to make the structure easily extendable to an arbitrary number of bits [
21,
25]. An
n-bit shift register is composed of
n D-type flip-flops (DFFs) connected in daisy-chain arrangement and controlled by the same clock signal. In principle, the SR is operating all the time, however, in practice, some SR-shut-down procedures can be included. The simplest one is to suspend the clock signal when unnecessary, thus disabling bit-shifting regardless of the input data signal. This can be done externally or at CC-level through a dedicated input signal in AND with the clock input, as in [
18]. To enhance noise immunity, lowering the probability of spurious transitions, the data line can be also suspended/disconnected in absence of the clock signal. Moreover, a dedicated power grid can be adopted for the SR, which allows switching off the SR when unused as in [
28]. Since the SR is the core element of the SIPO interface, the DFFs play a crucial role in determining its power consumption, latency and size. All these parameters are interdependent, thus achieving the best trade-off among them is a main design challenge and the most suitable implementation depends on the case, based on which one of these features has to be favored. Most of the SIPO examples available in the literature adopt the same DFFs for the SR and the HR [
20,
24,
25,
27,
29,
31,
32]. This choice is surely the most convenient in terms of design effort and modularity, since, once the DFF cell is optimized, it is just replicated
times to create the two
n-bit registers. However, in order to reduce the overall transistor count, hence reducing chip area occupation and improving yield, the HR can be implemented with simple D-type latches in place of flip-flops as in [
15,
19,
28]. Since latches are sensitive to levels rather than transitions, in order to adopt this solution the load command must be given in the form of a short enable pulse rather than a transition as discussed in
Section 6.
4. GaAs-Based Logic Gates
Gallium arsenide transistors are characterized by a Schottky barrier, which leads to forward gate conduction and voltage clamping and thus to a reduced voltage swing compared to silicon-based MOS technology. This in turn implies worse noise margins and the need of more complicated circuits aimed at avoiding the detrimental effect of gate conduction, also considering lack of complementary (p-type) devices, due to the large disproportion between hole and electron mobility. On the other hand, the larger transconductance, higher electron velocity and lower parasitic capacitances of GaAs FETs with respect to MOSFETs, allow for higher speed and higher drive currents at lower voltage, with an overall lower power consumption [
9,
14].
Many GaAs-based logic families have been developed since the early 80s when D- and then E-mode MESFET technologies became mature enough to provide acceptable yields [
33], as listed in
Table 2.
Early logic families were based exclusively on D-mode FETs (normally-ON devices), however, the high power consumption of even the simplest gate makes them hardly usable for developing complex structure as a SIPO interface. The availability of E/D-mode processes represented a breakthrough for the integration of digital architectures in GaAs core-chips, and are nowadays the preferred choice as can be noted from
Table 1. Despite the lower knee voltage and hence lower power consumption of E-mode transistors [
23], E/D processes are still limited with respect to depletion-only ones in terms of frequency and power density [
28], which makes desirable the development of low-power D-mode logic circuits for very high frequency applications.
When only D-mode transistors are available, the buffered FET logic (BFL) is the preferred implementation thanks to the better noise margins and sensibly increased fan-out capability with respect to UFL and SDFL, and the higher frequency achievable with respect to capacitor-based families. As shown in
Figure 3a, reporting the basic inverter (NOT) gate, each logic gate is loaded with a buffer stage (source follower, level-shifting diodes and pull-down transistor) increasing the current driving capability of the gate and performing the necessary level shifting required to drive the next stage, which needs a negative input voltage swing from
, corresponding to a logic 0, to 0 V, corresponding instead to a logic 1. The level shifting diodes are the most critical element of the BFL, since they are very sensitive to temperature variations and they cannot provide current limitation, which is crucial for power saving, considering the typically high saturated currents of the pull-down FET. Fortunately, since the late 90s, the current handling capability of the integrated resistors was significantly improved, and very high sheet resistance values were achieved, which allow the implementation of high value resistors (HVRs) with very compact layouts. The use of HVRs in place of diodes for level-shifting overcomes their limitations and offers the possibility to trade-off between power consumption and noise margins according to the specific design requirement [
28,
32].
Enhancement-mode transistors are driven by positive gate-source voltages, therefore the negative supply voltage and level shifting circuitry between cascaded gates are not required, which represents a main advantage over D-mode devices. For this reason, also in E/D-mode logic, the logic function is carried out by E-mode FETs, while D-mode ones are only used as active pull-ups ot pull-downs in place of resistors. Since E-mode FETs can be directly connected in cascade, the simplest logic family is the direct-coupled FET logic (DCFL), widely adopted (see
Table 1) thanks to its extreme circuit simplicity. As shown in
Figure 3b, the inverter gate is simply a common-source transistor, with the load resistor replaced by a D-mode FET in the E/D version. As anticipated, during the 80s and 90s E-mode-only logic was practically never exploited, due to the large area occupation and power dissipation issues related to resistive pull-ups [
22,
28], while nowadays both versions are exploited: active pull-ups have the advantage of non-linear resistance response varying from very small to very high values depending on the drain-source voltage, while resistive pull-ups are beneficial for power consumption, if properly sized, and, above all for yield. Resistors, in fact, are less subject to process variations than active devices, and in any case, the sensitivity of the logic gate to their exact value is much more relaxed with respect to the sensitivity to, e.g., the transistor threshold voltage. With E-DCFL the lowest transistor count can be achieved: in this case, the number of transistors required to implement a basic logic gate simply equals the number of its inputs. It is worth to notice that the DCFL shows a strong pull-down effect, while the pull-up one is weaker, especially if resistive pull-ups are adopted. This yield to a pseudo-open drain (POD) situation [
34,
35], with the 0 logic level being strongly affirmed, while in case of logic 1 output, the transistor is in an open state and depending on the relation between the total equivalent pull-up and load resistances the actual output voltage may differ from
.
Noise margins in DCFL are limited by the Schottky junction turn-on voltage and highly sensitive to threshold voltage variations. To improve noise margins, buffered E/D logic families have been introduced so far. The simplest BDCFL gates use a source follower (SF) either as the input (SFED logic) or output (EDSF) stage of a DCFL (the latter is shown in
Figure 3c), which also improves the gate’s fan-out. These improvements come at the price of increased circuit complexity, consumption and delay. The latter limitations have been solved by the super-buffered architecture (SBFL) [
36,
37,
38], also reported in
Figure 3c. In the classical buffer stage, the D-mode transistor acts as a fixed pull-down resistor, whose value should trade-off pull-down delay (requiring a small resistance value) and pull-up current consumption (requiring instead a large resistance value). In the SBFL the output stage exploits two E-mode transistors controlled, respectively, by the input and its complement, so that they increase or decrease their resistance in a complementary way, improving both pull-up and pull-down performance and achieving comparable rise and fall time with respect to DCFL. A main disadvantage of the SBFL is the high transistor count with consequent area occupation and power consumption. In particular, since in this case there are two switching transistors directly connected to the input (instead of one in the other logic families), both must be replicated when adding more inputs to create combinational gates.
Indeed, NOR and NAND gates are implemented by simply adding a second switching transistor, respectively, in series or in parallel with that/those of the NOT gate, while the pull-up and buffer elements are not incremented, as shown in
Figure 4, reporting few examples of NOR and NAND implementations in different logics. Being based on series combination, the NAND is faster and shows lower current consumption than the NOR, but, on the other hand, it implies voltage stacking, which may rise too much the overall output voltage with consequent interfacing issues with the following stage. When three inputs are needed, the NOR option is thus preferred, even if in [
32] a three-input NAND was exploited to reach an impressive bit-rate thanks to the lower latency.
The sensitivity to the transistor voltage threshold is still a major concern in both unbuffered and buffered logic families. A possible solution to this limitation is offered by differential structures, whose switching threshold is inherently independent of the individual device threshold voltage. Moreover, they can offer improved noise margins, larger voltage swings and high-speed operation, at the cost of increased area occupation and circuit and routing complexity, since a complementary data line needs to be added. SCFL adopts a source-coupled differential pair, while in [
14] a circuit variant relying on the Cascode differential pair is also proposed. Both architectures require additional source followers and level-shifting diodes, thus making the circuit very complex and hence not well suited for highly integrated MMICs [
22]. On the contrary, if a differential approach is exploited at flip-flop level, adopting a differential flip-flop structure driven by both true and complementary data and clock signals, an optimum trade-off among noise margin, transistor count and power consumption can be achieved [
15,
18,
39]. For more details on the various logic families, the reader can refer to [
9,
14,
22,
33] and references therein.
Logic Gates Simulation
Digital circuit must be tested in the time-domain through transient simulations. GaAs foundries usually provide their customers with process design kits (PDKs) containing transistors’ non-linear models for simulation with high-frequency CAD tools, while no PDKs are usually available for SPICE(-like) simulators, which would be instead more prone to digital design. The most critical issue, however, is not the simulation environment, since a transient simulation engine is included also in CAD tools, but the transistor models themselves. Empirical analytic models are often the preferred choice for foundry PDKs, as they offer the best accuracy in predicting the behavior of microwave circuits: in fact, such models are extracted by fitting experimental data covering the most relevant RF features, such as non-linear transconductance, self-heating, breakdown, etc., though well-behaved non-polynomial expressions optimized for extrapolation (within reasonable ranges) [
40], therefore giving fast simulation convergence and accurate prediction of device behavior when operated under similar conditions with respect to those used for model extraction. However, a major limitation of any empirical model is its inability to properly describe/predict the behavior of the devices in different operating conditions. In particular, the typical RF devices’ models, which are optimized for harmonic balance simulations in frequency domain, as this is the natural design domain for microwave circuits, are very bad-suited for time-domain simulations, where they can show big convergence/stability issues yielding to very slow simulations or even no convergence at all.
Another issue, highlighted in [
15,
39], is that PDKs model typically does not allow interchanging the role of drain and source terminals, i.e., the model is asymmetrical. In some cases, e.g., in processes with field plate, this asymmetry is mandatory, but more often is just added for convenience, since in microwave circuits the role of these two terminals is always distinct and defined a priori (even, in some models the source terminal is internally grounded and not available to the user). Finally, microwave active devices, to be useful in practical design must be rather large, usually with at least 2 gate fingers and 100 μm total periphery. On the contrary, for digital circuits very small devices, with one finger only and few microns of gate-width, are adopted, which are fully symmetrical, with the role of the drain and source terminals defined instantaneously by the operating conditions. To overcome these issue ad hoc extracted models can be used as done in [
24] or in [
15] using, respectively, the TriQuint’s Own Model (TOM) [
41] and a simplified symmetrical Angelov model [
42], both suitable for transient simulations. In both cases the ad hoc models have been fitted starting from the simulations results obtained with the original PDK models, accounting for DC and transient characteristics and for scattering parameters.
Besides device modeling, another critical point for simulating logic gates is the load: even when buffered structures are adopted, the gate needs to be simulated with a realistic load. The load is typically the gate of another FET device (either another logic port or a buffer amplifier), thus the basic input equivalent circuit, composed of properly sized gate-source diode and input capacitance in parallel, can be adopted [
14]. When cascaded ports are expected to be used, the most realistic loading condition of the single gate is a replica of itself [
25].
5. I/O Interfaces
Besides the DC supply, three input signals are required: the data stream, connected only at the first SR bit, the clock signal, distributed throughout the SR, and the load signal, distributed to all the latches/DFF of the HR and working as their enable signal, hence the name
latch enable in some of the reported works, e.g., [
15,
17]. A fourth control line can be added to include other functions, for example in [
15,
18,
27] a chip-select/SR-enable and an HR-reset signals are added, respectively, to allow the central unit to disable the clock and possibly data streams and thus to freeze the content of the SR before sending the load command relaxing its timing requirements, or to instantly revert the CC to a predefined state.
The external signals are typically CMOS/TTL (0–5 V) or LVCMOS/LVTTL (0–3.3 V) signals while the internal logic maybe different, thus requiring level shifting. In particular, D-mode logic gates require negative gate-source voltages to put the transistor in an open state, while E-mode logic accepts positive voltages, but the maximum input voltage swing is limited to roughly 1 V by the Schottky junction turn-on voltage. Moreover, input buffers and/or Schmitt triggers are also often included. Buffers improve the driving capabilities of the control signal, which is particularly important for the
C and
D signals that must drive all bits and hence, depending on the number of bits, a potentially large equivalent capacitance. Moreover, they decouple the input from the logic circuitry, minimizing the current sinking from the input, which can be beneficial also for the
D signal. The inclusion of an hysteresis block, either stand-alone or merged with another input conditioning circuit, filters out noise within a predefined window, sensibly improving the SIPO interface noise margins with respect to spurious fluctuations of the external control signals and providing neater internal controls with stable levels and sharp transitions. Finally, in some implementation, complementary clock and/or data signals are required (see
Section 6), which in turn asks for a splitter/inverter circuit to be added at the input.
Level shifting is achieved by means of diodes, or diode-like connected transistors with drain and source terminal shunted together, in totem-pole configuration [
14,
15,
22,
31], as shown in
Figure 5a. The diode/transistor type and size must be chosen accordingly to the desired voltage drop: E-mode FETs typically feature smaller saturation currents than D-mode ones and thus allow higher voltage drops, reducing the number of diodes required for level translation, but also the output voltage adjusting resolution. To overcome this limitation, a properly sized resistor can be added in series to the diodes, as shown in
Figure 5a [
15]. Moreover, if the voltage reference
has the same value as the logic-0 input voltage, a parallel resistor maybe added to avoid leaving diodes floating as in [
31]. Adopting large devices allow higher currents, resulting in a more stable voltage and a higher driving capability, eventually avoiding the need of a buffer. On the other hand, the level shifter draws the required current directly from the external input signals, thus a high demand may represent an issue in very large AESA systems, where the central unit should feed a huge number of CCs. In these cases, the adoption of a buffer stage is practically mandatory to achieve small input current leakage.
Buffer stages can be simple common source amplifiers [
15]. In this case, the function of the last (output) diode of the level shifter can be accomplished by the gate-source junction of the first buffer’s transistor, as depicted in
Figure 5b. Additionally, with a dedicated buffer, in order to provide high current capability, large devices may be required, which implies large capacitances and thus possible speed limitations. When high-speed operation at high current is required, the number and size of buffer stages must be carefully chosen following a tapered-buffer approach [
43,
44]. For example, in [
15] two buffer stages have been adopted with the second stage’s transistor twice as large as that of the first stage, to ensure fast commutation and preserve input signal polarity. In [
19] an interesting solution is proposed where the level shifting and the (non-inverting) buffering functions are merged together by adopting an E-mode source-follower buffer along with translating diodes and a D-mode active pull-down, as reported in
Figure 5c.
Similarly, the output buffers/shifters are required to boost the current driving capability and to adjust the signals’ voltage level to what required by the analog circuitry. The latter is typically implemented with D-mode devices, hence requiring negative voltages. In [
15,
19,
31] different examples of output buffer are presented, with or without diodes. The best topological solutions for the I/O interfaces are highly dependent on the specific input/output dynamics of the CC under development.
Hysteresis can be obtained through positive feedback by making the total loop gain higher than unity. In [
45], a compact and low-power Schmitt trigger was proposed, based on two cascaded inverter stages, as shown in
Figure 6 (black path). By comparing this structure with the common source buffer shown in
Figure 5b it is clear that the Schmitt trigger can be easily merged with the buffer stage, by simply adding a proper feedback network. Furthermore, double inversion is the typically adopted solution to obtain two strictly symmetrical complementary signals from a single one [
46,
47] (black and blue outputs in
Figure 6). Therefore, in differential structure, hysteresis and complementary signal generation functions can be exploited by the same circuit, while one or more further buffer stages can be added at both inputs, if necessary.
As a final remark, input signal synchronization is crucial, thus, when many input blocks are required, the best option is to replicate them for all inputs so as to ensure equal/comparable delays for all signals, even if at the price of slightly higher transistor count.
6. Memory Blocks
The simplest 1-bit memory block is the bistable circuit of
Figure 7a composed of two cross-coupled NOT gates. By replacing the NOT gate with a NOR or a NAND a set/reset latch can be obtained where the bit to be stored can be fixed externally. For the NOR- and NAND-based structures to remain in the hold (memory) state with the inputs must be both 0 or both 1, respectively. To obtain a D latch an input section is added controlling how and when the set and reset commands are sent to the latch. This avoids the set and reset commands to be applied simultaneously and adds a latch enable/disable function, which can be used as level-controlled timing (clock,
C) mechanism. By using, NAND or NOR gates, active-high (i.e., transparent for
) and active-low (i.e., transparent for
) D latches can be obtained, respectively. Some possible active-high configurations are shown in
Figure 7.
Figure 7b reports the basic differential-input D latch, while
Figure 7c shows how to connect the input ports to obtain instead a single input latch with minimized gate count avoiding the additional NOT port to create
from
D. Note that these two architectures work also with NOR gates but becomes active-low). Finally, it is to note that in case of DCFL implementation, the pseudo-open drain (POD) effect can be exploited to simplify the circuit and reduce the transistor count, as done in [
15,
18]. As shown in
Figure 7d, with proper sizing of the pull-up resistors, the input NANDs can be in practice considered in an open-drain state when the enable signal
C is low, hence leaving unaffected the following memory block. The latter can be therefore implemented with simple NOT gates, eliminating two transistors with respect to
Figure 7b. When instead the enable signal is high, the complementary outputs of the NAND ports override the memorized state, without compromising the proper operation of the cross-coupled NOTs.
Latches are sensitive to levels and thus update their outputs continuously as long as
(if active-high). However, in a shift register, the data must be transferred from one bit to the following one according to the clock rate. The inputs must be thus sampled at specific instants, corresponding to either the rising or falling edge of the clock signal. The memory block that exploits this function is the D flip-flop. A D flip-flop can be built from two D latches controlled by complementary clocks: this is the common master-slave (MS) configuration, shown in
Figure 8a, which is controlled by the falling clock edge (the role of
C and
must be exchanged to obtain a positive-edge-triggered DFF). Being connected to complementary clock signals, the two latches are never both transparent achieving the required transition-controlled behavior is achieved.
The use of differential structures, which require both true and complementary inputs, can improve noise margins, especially in case of reduced-swing signals [
48]. As for the complementary clock, complementary data must be generated so as to be non-overlapping to ensure proper operation of the flip-flop. This comes at the price of larger occupied area and possibly increased power consumption, hence requiring careful optimization. The works reported in [
15,
18,
39] exploit a differential MS (D-MS) flip-flop composed of two identical D-type latches of the kind reported in
Figure 7d. Thanks to the pseudo-open drain nature of the NAND ports, the total transistor count is kept relatively low: only 12 transistors, excluding that/those for complementary clock generation and in case of E-DCFL implementation, instead of the 16 required with the classical latch structure of
Figure 7b (or
Figure 7c).
Master-slave DFFs can also use two different latch architectures for the two latch cells, separately optimizing each one [
33]. In general, the use of MS DFFs requires a rather high number of transistors. To overcome this issue, ad hoc optimized flip-flop architectures should be evaluated/developed, such as, e.g., the modified MS flip-flop in [
46], sharing one port among the two latches. In particular, for GaAs implementation, a well-established optimized structure with single input, single clock driving and minimized gates count is the 6-NOR D flip-flop [
14], shown in
Figure 8b. As can be noted from
Table 1, this is one of the most widely adopted flip-flop architectures. It is composed of a simple set/reset latch as output stage and 2 interacting input latches, for a total of 5 two-input NORs and 1 three-input NOR. In [
32], where the main target was speed, a 6-NAND structure [
8] was preferred. The operating principle is totally equivalent to that of the 6-NOR one, except that it is sensitive to the opposite clock transition (negative-edge-triggered with NORs and positive-edge-triggered with NANDs). Assuming an E-DCFL implementation, the minimum number of transistor required to implement this DFF is 13.
Table 3 compares the transistor count for the two analyzed DFF structures for all most relevant logic families highlighting the advantage of the 6-NOR DFF in case of buffered logics.
As anticipated in
Section 3, at SIPO-level the overall transistor count can be reduced by adopting DFFs for the SR and simple latches for the HR [
15,
19,
28]. In this way, the total number of transistor per bit is reduced to roughly 81% for the 6-NOR DFF case and down to 75% for the MS DFF case. However, latches are sensitive to levels rather than transitions, while the HR must typically respond to an edge of the external load input signal
L. To adopt this solution, the load command must be given in the form of a short enable pulse rather than a transition, or better, since generating, distributing and level-shifting such a short pulse to all CCs in a phased array is practically impossible due to noise fluctuations, the external
L edge must be converted into a pulse with a proper circuit on-board the CC. In [
15] an example of such a circuit is reported: it is based on an analog
filter (delayer) and requires only 5 active devices. The length of the enable pulse must be carefully optimized: it should be as short as possible to prevent possible instability issues, but at the same time long enough to guarantee proper data loading. In any case it must not exceed half of the clock period otherwise spurious data changes in the HR will likely happen.