1. Introduction
Photon-counting imaging with time-correlated single-photon counting technique (TCSPC) has obtained much research interest due to its high time resolution and sensitivity [
1,
2,
3,
4,
5,
6]. Among other advantages, photon-counting imaging allows to operate in extremely low light level environment [
7,
8], which is of significant help for improving image quality when signal intensity is restricted, such as in remote imaging or low-reflectivity target imaging. For photon-counting LIDAR applied in such weak echo scenario, it generally requires a long acquisition time to not only suppress false alarms caused by noise counts but also to obtain sufficient signal photons to reconstruct clear images. However, such requirement would obviously limit imaging efficiency.
Methods to promote imaging efficiency (less dwell time) at low light level have been previously investigated by some researchers, and their methods can generally be divided into two categories. The first category is to set acquisition time for each pixel fixed and to use a variety of prior information such as correlation of adjacent pixels or the number of depth clusters to recover depth image [
9,
10,
11,
12,
13]. Methods of this category are essentially not optimal for high efficiency imaging because signal photon levels are actually different from pixel to pixel, and the fixed acquisition time is either redundant for bright (high reflectivity) pixels or inadequate for dark (low reflectivity) pixels. The other category is to scan each pixel with a photon level-dependent time. A well-known method of this category is first photon imaging (FPI) [
14,
15,
16] which utilizes only the first detected photon to reconstruct images. However, according to its
supplementary material [
15], in the conducted experiment, average signal photon and noise photon per pulse were 0.09 and 0.1, respectively. This signal to noise ratio (SNR) is almost 0 dB and is too high to be achieved for many applications such as daylight imaging or underwater imaging where solar noise or backscattered noise is high. As would be demonstrated later, FPI is not suited to low SNR environments where most of its first photon counts would be due to uninformative noise.
For photon-counting LIDARs used in extremely low signal photon environments with significant noise, an efficient imaging method with noise robustness is needed. In this article, we reported a first signal photon unit (FSPU) method which utilizes the different statistics of noise counts and signal counts to rapidly distinguish signal photon clusters. In our method, more pulses are transmitted to illuminate dark pixels, while less pulses are transmitted to bright pixels to maintain both efficiency and image quality. The method is able to reduce acquisition time sharply compared to a traditional photon-counting algorithm such as the peak and cross-correlation method, and it also performed better than FPI, especially in low SNR scenarios. The method does not require any of the background noise calibrations that need to be conducted in the state-of-the-art photon-efficient methods by D. Shin et al. [
9] and J. Rapp et al. [
12]. Both simulation and experiment were conducted to verify the feasibility of the method.
2. Methods
The overall photon-counting process is the Poisson point process [
17], and it could be viewed as the sum of two independent Poisson processes, that is, signal inhomogeneous Poisson process and noise homogenous Poisson process. The rate of signal Poisson process is actually concentrated within pulse width, whereas the rate of noise Poisson process is flat. Considering the low light level environment, pile-up effect [
18] or dead time effect could be ignored, and, accordingly, the time tags of signal counts and noise counts would obey Gaussian distribution (we suppose laser pulse waveform is Gaussian) and uniform distribution, respectively, as
Figure 1 indicates.
According to Gaussian theory, the variance of signal time tags would be equal to where denotes root mean square (RMS) pulse width. Variance of noise time tags, from uniform distribution theory, would be 1/12 , where is the pulse repetition period. For , the most evident difference is that the variance in signal counts is rather tiny, while variance in noise counts is large, which means signal counts tend to cluster together and noise photons would occupy the whole temporal duration.
Exploiting this characteristic, we developed our FSPU method with two aspects; acquisition strategy and reconstruction strategy. The essence of acquisition strategy is straightforward and it can be briefly interpreted that we will not transfer to the next pixel until the first signal photon unit of current pixel has been distinguished. More specifically, we define two acquisition parameters, unit size
and unit range
. For each laser pulse, time tags of photon counts, if any, will be registered to system, and when the registered photon counts are found to be clustered, that is,
photon counts within
range in our method, we believe these
counts are all due to signal photons and they are defined as FSPU, and then we transfer to next pixel.
Figure 2 demonstrates our method.
Reconstruction strategy consists of three steps. The first step is to estimate range time through maximum likelihood estimation (MLE). Since we believe distinguished FSPU is caused by signal photons, the MLE of expectation of Gaussian is actually the average of FSPU as Equation (1) shows.
Subscript
represents the
-th row
-th column pixel and
means the time tag of
-th count of FSPU. The second step is anomaly censorship. Although noise counts generally would not cluster, there is a chance that noise counts would meet the
-
criteria earlier than signal counts, and thusly they are falsely deemed as FSPU. In this step, for pixel
, we will calculate the median value
of its
neighborhood and compare the pixel value
with this median. If the pixel value is
away from median, it is believed to be noisy. The underlying reason is that 95% signal counts would be within
around expectation for Gaussian distribution. The noisy pixels would be replaced with their medians as Equation (2) demonstrates.
The third step is total variation (TV) regularization [
19,
20]. We add a TV penalty term to our estimated time image, as:
where
is an
size matrix whose
-th row
-th column element is
and
is the size of target image.
TV is defined as:
Equation (3) is actually a minimum optimization problem where is a coefficient to make a trade-off between MLE data and correlations of adjacent pixels. In experiment, we used and its optimal value is the focus of our future research. After acquisition of reconstructed time image , range image is obtained by .
3. Simulations
To quantify our method performance, we first used the Middlebury dataset [
21] to conduct simulations.
Figure 3a describes the distribution of signal counts numbers per pulse
and
Figure 3b is the truth range image
. Signal counts were generated from an inhomogeneous Poisson process with
as its rate where RMS width
was set as 0.6 ns to maintain consistency with the laser used in our later experiment. Noise counts were generated from a homogeneous Poisson process with noise intensity
as its rate. Image size
and
were set to be
and 200 ns, respectively, in simulations.
were chosen as 0.01 MHz, 0.1 MHz, 0.25 MHz, 0.5 MHz, 0.75 MHz and 1 MHz, consecutively, to generate several different noise levels. The mean
SNR is evaluated as:
In the simulation, we chose
and
to be 5 and 1.2 ns, respectively, which meant our FSPU was defined as five counts within 1.2 ns temporal width. The reconstruction processes with
are demonstrated in
Figure 4.
We also reconstructed depth images using other methods such as peak, cross-correlation, unmixing [
12] and FPI [
13] to compare their performances with our FSPU method. Mean square error (
MSE), defined as:
is utilized as a measure of reconstruction performance where
is true depth of pixel
. Efficiency is measured through the number of emitted pulses per pixel (ppp). Less pulses means less acquisition time and, thus, higher efficiency. Although numbers of pulses for each pixel vary for our FSPU and FPI, we can calculate the average number of pulses per pixel to evaluate efficiency, that is, total emitted pulses number divided by pixel size.
First, we ran our FSPU method, and the number of emitted pulses per pixel was obtained through above-mentioned calculation. To compare performances of these methods with a common baseline, we restricted the number of emitted pulses for each pixel of peak, cross-correlation and unmixing to be identical to our FSPU, which means we restricted the efficiency of these three methods to be the same as our method. The result is displayed in
Figure 5.
Although FPI is the most efficient method (FPI only needs 1 cpp to reconstruct images and therefore the number of emitted pulses is the lowest), it has poor performance, especially when SNR is low. For example, when SNR is lower than 0 dB, MSEs of FPI are large and the reconstructed images are almost nonsense because most of its first photon counts are noisy and they convey no range information. Emitted pulses number for the first four methods seems much larger than FPI; however, as signal is weak, the signal detections are essentially as few as several counts. The peak method is a traditional photon-counting imaging method which chooses the position of histogram peak as rang time. The cross-correlation method is to make a cross correlation between system instrument response function (IRF) and histogram, and then it finds the position of maximum value.
From
Figure 5, we could see that MSEs of these three fixed acquisition time methods are larger than our method, which indicates the reconstruction performance of our method is much better. The result is intuitive because the pulses number is identical for each pixel for these three methods, which does not take pixel differences into consideration.
Actually, when SNR decreases, the image quality of our method worsens, as
Figure 5 shows. For low SNR conditions, we can improve the quality by increasing
or decreasing
to further avoid false clusters, although this may result in lower efficiency. Monte Carlo simulations were conducted to demonstrate how
and
influence efficiency and robustness, as
Figure 6 indicates.
Here we set noise intensity as 0.25 MHz, signal photon number as 0.01 per pulse and RMS laser width as 0.6 ns. If counts of a FSPU are all due to noise, it is deemed as a false alarm. From
Figure 6, we could notice that larger
and smaller
would promote robustness, but reduce efficiency. In fact, the method allows us to make a trade-off between efficiency and noise robustness through
and
. The method is actually an extension of FPI. When
, it would be simplified to be FPI, which only focuses on imaging efficiency. In addition, background noise calibration is not required in our method. We only need to know an approximate level of noise to provide information to choose
and
, instead of measuring accurate noise distribution of each pixel which is generally a time-consuming task. Experience shows
and
would be reliable choices for a variety of scenarios.
4. Experiments
To explore the performance of the method in a real imaging environment, we carried out experiments in a laboratory. The photon-counting LIDAR system is designed as
Figure 7 demonstrates.
Laser (CryLaS FDSS532) emits pulses at the fixed repetition rate of 10 KHz. Mirror 1 deflects the laser beam at 90° into a linear polarizer. As the laser beam is linearly polarized from the laser, rotating the polarizer can adjust pulse energy afterwards. Two lenses, L1 and L2, are jointly used as beam expander to lower down laser divergence. A beam splitter deflects 1% beam into a PIN photodiode, the signal from which serves as start baseline. The remaining 99% beam continues its travel through a perforated mirror and then emits out to scenes of interest by two galvo scanning mirrors (Thorlabs GVS012). The back reflected echo photons enter the system in the same way and as the reflected beam is diffusely scattered, most of it would be deflected by the PERM into the receiving part. NBF moves away noise photons which are out of emitting wavelength. Eventually, the echo photons are coupled into a multimode fiber that connects to the Gm-APD (Micro Photon Devices PDM series $PD-050-CTC-FC whose photon detection efficiency is about 42% for 532 nm wavelength and dead time is 77 ns) at the other end. A gate of 200 ns was utilized to block system back-reflection light.
A football, a man-made mannequin, a box and the background wall were about 4.0 m, 5.5 m, 5.8 m and 6.1 m away from the front of our photon-counting LIDAR, respectively. The setup was placed in dark laboratory environment and an incandescent lamp with adjustable illumination was utilized as the source of noise. Details of the experiment are included in
Appendix A. During the experiment, the echo level was extremely low. The signal counts level, on average, was about 0.0074 per pulse which meant that every 135 pulses would only generate 1 signal count. Three noise levels, the corresponding SNR of which were −5.14 dB, −8.35 dB and −10.85 dB, were achieved through the adjustment of the lamp. The result of the experiment is demonstrated in
Figure 8.
Acquisition of the ground truth depth image and signal-noise evaluation processes are available in
Appendix A. From the result, our method shows better MSEs compared to others. FPI, without any doubt, is the most efficient method, but it suffers much larger error to be almost noninformative. Results of the experiment have the same tendency as simulation. For the same total acquisition time (which means the same efficiency), our method could recover higher quality depth image, which, on the other hand, reveals that to obtain the same quality image, our method would require much less time. The experiment data and code are provided as we show in the
Supplementary Material.
It is worth noting that MSEs of the unmixing method are also tiny, almost the same as our FSPU method. However, the unmixing method needs accurate noise levels to reconstruct clear images, while our method does not require such information. In this experiment, noise levels were evaluated beforehand and if noise levels are unknown, the unmixing method would have much worse reconstruction quality.
5. Discussion and Conclusions
In conclusion, we reported an efficient and noise robust photon-counting imaging method which is specifically suitable for low signal and high noise environments such as daylight imaging or underwater imaging. The method exploits the statistical difference that signal would cluster while noise would decentralize, in order to distinguish signal counts rapidly. The acquisition time of each pixel is a photon level-dependent variable and background noise calibration is not needed any more. The method is an extension of FPI and it outperforms the mainstreaming photon-counting methods in low SNR scenarios. By introducing two acquisition parameters and , the method allows users to make a trade-off between efficiency and robustness. For example, when SNR is low, larger and smaller would be helpful to avoid noise, although acquisition time might be longer. More importantly, the method does not need noise calibrations. Although it is not mentioned in the paper, the method has the ability to recover reflectivity images with the elapsed pulse number of each pixel. However, the elapsed pulse number is not only related to reflectivity, but also to noise level, and there is no such a mature theory or analytical solution to settle this problem, which is the priority of our future work.