2.1. Analysis Criteria
To proceed with the theoretical analysis of the scaling laws, some assumptions must be made. First, the SPAD pixel array configuration is assumed to be a square grid, while it is not difficult to generalize the discussion to other configurations, e.g., honeycomb structure [
10]. Second, circular-shaped SPADs are assumed to simplify the discussion on the curvature change with scaling. In some prior works, rounded-corner rectangle or square SPADs are also adopted to improve the fill factor [
11,
12,
13]. However, these designs are not always suitable for scaling with the geometric similarity preserved, where the electric field concentration at the corners can induce premature edge breakdown and also change the breakdown voltage. Third, a 3D-stacked configuration with a SPAD-only array in a single plane is assumed.
In a non-3D-stacked FSI or BSI configuration, SPAD and pixel circuits coexist in the same plane. In a given pixel pitch, the SPAD and circuit have to share the limited area, and the circuit complexity can affect the size of the SPAD active area and its performance. The main focus of this analysis is to formulate the scaling laws of SPAD performance, and, hence, the SPAD array without circuit components is desired for more systematic and quantitative analysis. Fourth, the active-to-active distance is assumed to be fixed at a certain dimension, irrespective of the scaling parameter. This is justified by the following discussion.
For analysis of the scaling laws in the SPAD pixel, it is natural to assume that the doping profile along with the
z-axis for each implantation layer is unchanged, and the breakdown voltage of the p-n junction in the SPAD remains in the same range. This implies that, unlike scaling in MOS transistors, where a lower supply voltage is adopted for the smaller devices, the power supply voltage for the SPAD does not scale as a function of its dimensions. Another premise in SPAD pixel design is that the guard-ring width is sufficiently large to avoid premature edge breakdown. Given the fact that the lateral diffusion length of doped ions cannot readily be controlled, the electrostatic potential distribution around the guard ring is not dependent on the active diameter. The optimum guard-ring width ensuring no edge breakdown in the operating condition is defined by the following equation [
14]:
where
is the breakdown voltage at the guard ring with the given guard-ring width
,
is the breakdown voltage at the vertical p-n junction, and
is the maximum excess bias used in the system. Based on the discussion above, all the terms in the above equation are not dependent on pixel size, and the optimum
can be defined regardless of scaling. These considerations impose a constraint in the pixel scaling that the guard-ring width has to be unscaled and fixed at a certain value over all the SPAD pixel dimensions to guarantee stable Geiger-mode operation without unwanted edge breakdown. The optimum
should be comparable to the depletion width of the main SPAD p-n junction, and is typically 1 to 2 μm [
15]. In addition, the optimum width of an isolation layer, typically formed with deep-well implantation, is determined by a process design rule for the minimum drawing width, and should not be scaled with the pixel dimensions. The pixel pitch
, which will be employed as a scaling parameter in the following discussion, can be expressed as:
where the well-sharing configuration is assumed, and
is the active diameter,
is the active-to-active distance, and
is the isolation width. In the following discussion,
and
are both assumed to be 1 μm unless otherwise noted, and
is assumed to be solely dependent on
.
Figure 1 shows the conceptual views depicting the SPAD pixel scaling.
Figure 1a is the example of a top-view layout for a 2 × 2 pixel array. As discussed above, the active-to-active distance
is fixed when shrinking the pixel pitch
. As a result, the active diameter
is reduced proportionally to
. This assumption can be applied to any type of existing SPAD device structures [
16,
17]. For example,
Figure 1b shows the cross-sectional view of p+/NW SPAD.
is defined as the diameter of the inner circle of the guard-ring p-well, whereas
corresponds to a sum of the NW separation width and twice the width of the p-well guard ring. For PW/deep-NW SPAD or p-i-n SPAD,
equals the diameter of the p-well, and
is a sum of the NW separation width and twice the width of the virtual p-epi guard ring. This indicates that the scaling law analysis can be performed with only three key dimensional parameters,
,
, and
, without losing generality.
In summary, the main assumptions for the analysis of scaling laws are:
a uniform square grid,
a circular shape for the active area and inner/outer borders of the guard ring,
a 3D-stacked configuration with full separation of the SPAD and pixel circuit into different wafers,
an active-to-active distance unscaled with the SPAD pixel dimension, and
the pixel pitch employed as a scaling parameter.
2.2. Formulation of Scaling Laws
2.2.1. Fill Factor
The FF in the SPAD pixel, defined as the ratio between the drawn active area and the pixel area, is one of the fundamental parameters determining the single-photon sensitivity. FF is a purely geometric parameter, and is straightforward to be formulated as a function of the pixel pitch
:
It is clear from the above equation that FF goes down to zero when and cannot be defined for . For sufficiently large , FF converges to = 78.5%.
Figure 2 shows the calculated FF as a function of the pixel pitch for several different active-to-active distances. FF curves show monotonic increases with the pixel pitch
. A relatively steep increase of FF is observed at smaller
, whereas saturating behavior of FF is shown at larger
. Slower saturation for larger
indicates that, if the active-to-active distance is large, a larger pixel pitch is required to obtain a higher FF, e.g., above 50%. In the actual sensor design, the effective FF can be enhanced by employing on-chip microlenses [
18,
19], although designers should bear in mind that microlenses are less effective for smaller f-numbers of the main objective lens.
2.2.2. PDP and PDE
The PDP in SPAD pixels is defined by the following equation [
20]:
where
is the quantum efficiency and
is the avalanche triggering probability. In ideal SPAD devices, PDP represents the single-photon sensitivity normalized by the active area, and it does not scale with the active diameter and the pixel pitch. In practice, a discrepancy between the “drawn” active area and “effective” active area leads to considerable dependencies of PDP from the scaling parameter
[
21].
The discrepancy between the designed and actual active size stems from two possible reasons: nonideality in the process fabrication and nonideality in the device design. One example of the process nonideality is the lateral diffusion of doped ions [
22]. The lateral diffusion length is determined by the type of dopant ions, implantation energy, and thermal annealing conditions and is typically in the order of 0.1 to 1 μm for deep well implantation. This lateral diffusion induces the decrease of the doping concentration at the edge of the active area. The electric field at the edge of the active area can be locally reduced with respect to the electric field at the center of the active area, thus, lowering the sensitivity at the border of the active area.
On the other hand, the device design nonideality is caused by a lateral electric field near the guard ring. Photocharges generated in the neutral region of the SPAD randomly move around due to thermal diffusion until they reach the nearby depletion region and are drifted to an electrode. If the photocharges reach the main p-n junction with a high electric field, they induce avalanche multiplication, thereby generating a photon detection signal. However, photocharges close to the border of the active area can reach the depletion region toward the guard ring before reaching the main junction. In such a case, the carriers do not cause avalanche multiplication, and no photon detection signal is observed. This so-called “border effect” [
23,
24] causes the photon detection loss at the edge of active area, which becomes more significant in the smaller pixels.
For both process- and device-originated nonidealities, PDP correction can be performed by introducing an inactive radius
, representing the effective width of the photon-insensitive region at the edge of the active region [
25]. The corrected equation for the scaling law of PDP is given by:
where
is the virtual maximum PDP with a sufficiently large active size.
Figure 3 shows the calculated PDP as a function of the pixel pitch for different
. The curve with
0 μm corresponding to the ideal case with no border effect shows no dependency with
. For finite
, PDP starts from zero at
and grows and saturates to
with increasing
. Similar to the scaling law for the FF, a slower increase is observed for the larger
.
PDE is another indicator of single-photon sensitivity. Unlike PDP, where the sensitivity is normalized by the active area, PDE is defined as the single-photon sensitivity normalized to the pixel area. The following equation holds [
20]:
Based on the previous equations, PDE can be explicitly formulated as:
Figure 4 is the calculated PDE as a function of
for different
. Similar to FF and PDP, the curves start from zero at smaller
and saturate at larger
. The maximum PDE is given by
, assuming
. Again, introducing on-chip microlenses will potentially increase the overall PDE.
2.2.3. DCR
DCR has several different causes, such as band-to-band tunneling, trap-assisted tunneling, trap-assisted thermal generation, and the diffusion current [
26,
27]. Experimentally, the source of the DCR can be classified based on an Arrhenius plot [
28,
29,
30]. In silicon SPADs, the activation energies
for band-to-band tunneling, trap-assisted tunneling, trap-assisted thermal generation, and the diffusion current are known to be approximately 0, 0–0.55, 0.55, and 1.1 eV, respectively. In practice, the measured
can have intermediate values, e.g., 0.8 eV, indicating a mixture of multiple DCR components.
Based on the assumption that premature edge breakdown is suppressed, the tunneling components at the edge of the active region can be neglected. Contributions of the thermal generation and diffusion current are also negligible in the depletion region to the guard ring due to an insufficient electric field for avalanche triggering by the generated carriers. Therefore, the contribution from the main p-n junction of the SPAD dominates over that from the edge of the active region. Interestingly, all the aforementioned DCR components are proportional to the “effective” active area.
The tunneling current, regardless of being band-to-band or trap-assisted, is proportional to the total volume of the region with a highly concentrated electric field, which is clearly proportional to the active area. Thermal generation and diffusion carriers are detected only when those carriers are generated in the vicinity of the active region. Assuming that thermal generation and the diffusion current are spatially uniform around the active region, those components are also naturally assumed to be proportional to the active area. The scaling law for DCR can be formulated as follows:
where
is the DCR per unit of active area.
Figure 5 is the calculated DCR as a function of
for different DCRs per unit area
. Starting from 0 cps at
, the DCR shows a parabolic increase with
. the DCR is highly dependent on
, which is a function of the excess bias, temperature, and process quality, such as the trap and impurity densities. Opposite to the FF, PDP, and PDE, a smaller pixel pitch is desirable to improve DCR performance. The designer should consider the best tradeoff between PDE and DCR to find the optimum
to, thus, provide a reasonable S/N ratio.
The DCR density
R is defined by the DCR normalized by the drawn active area and is often used for comparison of the SPAD process quality between devices fabricated in different processes [
31]. As with PDP, nonideality, such as for the border effect, leads to the dependence of
R on
as follows:
At larger
, the DCR density saturates to
.
Figure 6 shows the
dependence of the DCR density for various
. As can be seen from the similarity to the equation for PDP, the DCR density starts from zero at
and rapidly increases and saturates for larger
. This implies that, in the actual measurement, the DCR density can be underestimated at the smaller pixel pitch due to the existence of the photon-insensitive region at the edge of the active region.
Note that the above discussion is based on the assumption that the guard-ring width is optimized to avoid edge breakdown for the entire range of the pixel pitch. In the actual device design, sometimes an abrupt increase of DCR and DCR density is observed at a smaller pixel pitch even with fixed active-to-active distance. To the best of our knowledge, no systematic analysis has been conducted for this phenomenon. One possible reason is the enhanced curvature at the edge of the active region inducing a high electric field near the guard ring. Analogously to antennas, the electric field tends to increase in regions of high curvature, which may induce premature edge breakdown when scaling down the pixel. Another possible explanation is the nonideality in the photoresist formation process.
In most SPAD devices, the diffusion regions for the p-n junction, guard ring, or isolation are formed by well doping where high energy doping is employed. In such a process, a thicker photoresist is desired to avoid penetration of the accelerated ions through the resist. The opening size of the photoresist for such a thick resist (typically 3 to 10 μm) requires careful calibration to match the actual shape and size to the designed layout. The layout for well doping is usually supported only for 0 or 90 degree lines, whereas a SPAD layout often involves a circular or ring shape with arbitrary angles. This could cause the deviation of the actual resist opening size from the design especially in the smaller pixel dimension, leading to unwanted edge breakdown.
2.2.4. Afterpulsing Probability
Correlated noise, such as afterpulsing and crosstalk, is critical for certain applications where the temporal and spatial correlations of photon detection signals play key roles [
32,
33,
34]. Afterpulsing is caused by an avalanche-generated carrier captured at a deep trap state near the multiplication region, which is released by thermal activation or tunneling after a nanosecond to microsecond trapping time, thus, inducing another avalanche multiplication event. This mechanism implies that the afterpulsing probability
is dependent on the trap density
and the total number of avalanche-generated carriers
. A higher trap density and more avalanche carriers result in a higher
. If
is not overly large, e.g., smaller than 10%, a linear relation between
and
can be assumed to a first-order approximation [
35].
Assuming the spatially uniform distribution of the deep trap states,
is independent of the scaling parameter.
, on the other hand, can be dependent on the scaling parameter.
is calculated based on the following:
where
e is the elementary charge;
is the excess bias;
is the total parasitic capacitance at the SPAD output node, either cathode or anode, which is connected to the quenching resistor;
is the p-n junction capacitance at the active region; and
is the sum of the other parasitic capacitance contributions from connected metal wires, diffusion regions, gates, etc.
is proportional to the active area, whereas
does not scale with the pixel size or the active size. In summary, the scaling law of
is given by:
where
A is the temperature-, bias-, and process-dependent coefficient;
is the permittivity; and
is the effective depletion region width determined by the p-n junction doping profile.
Figure 7 shows the
dependence of the afterpulsing probability for various
and
(dashed lines for
5 fF, and solid lines for
30 fF). For all parameter combinations, the parabolic increase of
is shown with the offset corresponding to
. A larger
shows a weaker dependence of
on
, indicating less contribution of the p-n junction capacitance to the total parasitic capacitance
. In any case, scaling down of the pixel has a positive impact on the afterpulsing probability due to the reduced parasitic capacitance.
Note that the dead time is assumed to be constant for all
in this analysis. In a real device design, fixed quenching resistance results in the
dependence of the dead time. This secondary effect makes the
less sensitive to
compared to the case where constant dead time is assumed. If the dependence of
on the dead time is strong enough to compensate for the trend as shown in
Figure 7 then it will be possible to flatten or even reverse
for larger
.
2.2.5. Crosstalk Probability
Crosstalk is another type of correlated noise in SPAD pixels. Unlike afterpulsing, where only a single pixel is involved, crosstalk involves two or more pixels. When avalanche multiplication is triggered in a pixel, thousands to millions of electrons and holes are generated. When those carriers are recombined with counterpart charges, either photons or phonons can be emitted to preserve the energy conservation law. Silicon is a material with an indirect bandgap, and hence the probability to emit photons is very low. For photon energy higher than the silicon bandgap, only several to tens of photons are emitted out of the one million avalanche-generated carriers [
36,
37]. However, those photons can move toward a neighboring pixel and be detected.
Similar to afterpulsing, the crosstalk probability
is dependent on the number of avalanche-generated carriers
. A larger number of carriers leads to a higher
. Again, to a first-order approximation,
is considered to be proportional to
. In addition, the distance between pixels is another important factor for scaling. Given that the emitted secondary photons decay exponentially with the travel length, a shorter pixel-to-pixel distance could result in higher crosstalk. The emitter-to-receiver distance dependence of crosstalk can be approximated by [
38]:
where
B is a coefficient that will be explained later,
r is the distance from one SPAD of interest to the other, and
is the effective decay length of the emitted light. Regarding the crosstalk between two nearest-neighbor SPAD pixels,
r in the above equation corresponds to the pixel pitch
. Note that this equation implicitly assumes that the light emission occurs at the center of the active region for the emitting SPAD, and the average photon intensity reaching the active region of receiver is approximated by the photon intensity at the center of the active region of the receiving SPAD. In reality, the finite size of the active region for both the emitter and receiver may cause a slight deviation of the measured crosstalk from the above model. For simplicity, the following analysis will be based on the above model where the effect of the finite active size is neglected.
The coefficient
B is dependent on both the emitter and receiver characteristics. Considering the emitter,
B should depend on the total number of emitted photons, which is proportional to
. On the other hand,
B should also be correlated with the sensitivity of the receiver. The probability of detecting an emitted photon is proportional to the PDP and the active area, which coincide with the PDE times
by definition. Thus, the crosstalk probability between two nearest-neighbor SPAD pixels can be expressed as:
where
is an excess-bias dependent coefficient.
Figure 8 shows the
dependence of the calculated crosstalk probability for various
and
. All curves show increasing trends for
close to
. For larger
, either increasing or decreasing trends are observed, depending on the parameter set. The curve with
0.2, 0.1 μm
, and
fF shows reduction toward zero, whereas the curve with
0.05 μm
and
fF shows a monotonical increase. Note that
0.2 μm
, and 0.05 μm
correspond to the cases with effective light emission wavelengths of 700 and 850 nm, respectively. In contrast to afterpulsing, crosstalk probability does not necessarily show monotinic dependence on
; the impact of pixel miniaturization is highly dependent on the combination of model parameters.
To suppress crosstalk, several countermeasures can be considered. First, lowering helps to suppress the crosstalk probability at the expense of the PDP and PDE. affects both in the emitter and the sensitivity of the receiver, and hence the crosstalk probability follows the square law with respect to . Second, the formation of opaque deep trench isolation (DTI) could suppress the crosstalk. Trench materials with a lower refractive index can reflect the emitted photons and eventually confine the photons in the emitter. This could lead to an order of magnitude improvement of the crosstalk probability.
2.2.6. Power Consumption
The avalanche-originated power consumption in large-scale SPAD arrays is a key parameter as it grows proportionally to the number of pixels. The total power consumption in a SPAD array depends on the incident photon flux. For a systematic comparison, the following discussion focuses on the energy consumption per single avalanche event,
, in a single SPAD pixel. The power consumption at the readout circuits is not taken into account here.
is a product of
and
, expressed as follows:
where
is the bias-dependent coefficient and
is the breakdown voltage of the SPAD. Apart from the details of the coefficient, the equation has the same structure as that of the afterpulsing probability. Naturally, the calculated trend of the single-event power consumption
as a function of
in
Figure 9 shows similarity to
Figure 7.
2.2.7. Timing Jitter
The timing jitter in the SPAD is determined by multiple factors, such as the device configuration, doping profile, detection threshold, excess bias, and temperature, and it is not straightforward to formulate the scaling law for this. Qualitatively, a larger pixel pitch produces a higher timing jitter for several reasons: first, the spatial expansion of the avalanche multiplication process takes more time in the larger
due to the finite lateral avalanche propagation velocity [
39]. Second, a larger
requires slower rising of the output voltage due to the larger parasitic capacitance, leading to enhanced statistical variability. Further systematic analysis should be conducted for deeper understanding of scaling the timing jitter.
2.2.8. Summary of Scaling Law Analysis
In the above sections, the scaling laws of the key SPAD characteristics with pixel dimensions were investigated. Miniaturization of the SPAD pixel improves the DCR, afterpulsing, power consumption, and timing jitter, whereas it has an adverse effect on the fill factor, PDP, and PDE. The equations for the scaling laws are summarized in
Table 1. In particular, the degradation of the single-photon sensitivity is inevitable in the conventional SPAD pixel when its pitch becomes smaller than 10 μm. Further technological breakthroughs are required for SPAD pixel miniaturization toward multi-megapixel arrays.