1. Introduction
Continuous signals must be reduced to a discrete form for experimental data acquisition and processing, a process called sampling the signal. Therefore, a sampling rate must be selected, balancing the ability to retain important features from the data set with the speed of data acquisition and size of the data set. A famous theorem on this aspect is the Shannon–Nyquist theorem, which states that for a perfect reconstruction of a bandlimited signal, a minimum sampling rate of double the maximal frequency is required. Thus, for secure retention of important data features, a floor is placed on the minimum sampling rate allowed, resulting in a constraint for data acquisition speed.
Recently, finite rate of innovation (FRI) [
1] theory has been employed to achieve low sampling rates, with respect to the relevant traditional sampling schemes, in both ultrasound simulated and experimental data [
2], as well as in a simulated terahertz (THz) context in communications application [
3]. It has also been shown to handle the reconstruction of sparse signals robustly, such as in [
4] with the application for source resolution in radioastronomy showcased. Furthermore, in a frequency-domain optical coherence tomography context, an FRI method has achieved improved resolution and signal-to-reconstruction noise compared to the standard approach [
5]. This theory utilizes signals with a finite number of degrees of freedom per unit time, which pulse signals, such as those used in ultrasound and THz time-domain spectroscopy. This concept will be further explained and mathematically defined in the following section. Given knowledge of the pulse shape and number of reflections or pulses expected, a sampling rate below the Shannon–Nyquist limit can even be employed to achieve a full reconstruction of the sampled data [
3].
In this paper, FRI will be applied to both simulated and basic experimental THz data. THz covers the electromagnetic spectrum from frequencies of around 0.1 THz to 10 THz and has many inherent advantages by being non-ionizing, non-destructive and providing noncontact modalities. Therefore, THz has had great research interest in areas such as biomedical physics, security applications and material characterization over the past years [
6]. One constraint especially significant for biomedical in vivo imaging is the duration of the scan. For example, in the protocol our group developed for assessing skin hydration [
7], the patient was required to hold their arm stationary during the measurements, with any movement possibly affecting the recorded data. Thus, with a lower sampling rate these scans could be taken more quickly with less risk of this occurring and with the additional benefit of any time-dependent effects, such as occlusion of the skin [
8], being minimized.
In typical terahertz time-domain spectroscopy (THz-TDS) setups, a femtosecond laser is split into a pump and a probe beam. The pump beam excites the photoconductive emitter whilst the probe beam is controlled by an optical delay line to excite the detector at different THz signal time steps [
9]. Therefore, in the case of THz-TDS, the steps along the time delay line constitute the sampling points that the FRI method shown in this paper will seek to minimize. In addition to the standard measurement systems, this method could see use in single-pixel spectroscopic imaging methods [
10] by allowing for a lower sampling rate of temporal waveforms per pixel.
Our aim is to achieve lower sampling rates compared to current popular methods by utilizing FRI theory whilst maintaining similar experimental methodology and required foreknowledge of the sample. Although FRI has previously been applied to other fields such as ultrasound and to simulated THz data in communication applications, this is the first journal publication applying it to experimentally obtained THz-TDS data. By employing our THz FRI based method, we achieved low sampling rates in both simulated and experimental THz data sets whilst demonstrating agreement with the original simulated and measured signals, respectively.
2. Theory
The first building block for creating this FRI method is to simulate the THz pulse shape we expect from our experiment in a form which is compatible with later mathematical manipulations. In the frequency domain, sum-of-sincs (SoS) is the usual form utilized as the sampling kernel for these methods [
11]. This is because the kernel approximates a reflection or transmission response, and therefore is able to solve both non-periodic and periodic cases whilst having a finite duration itself which is easily mathematically manipulated. It is defined in general terms by [
2]:
where
is an integer in the chosen set of integers
,
is the frequency and
is the period containing an entire repetition of the SoS as this forms the repeating sampling kernel required for this method. The chosen integer number sets
and
are free parameters optimized for the specific application explored later in this section. In this work, we applied this method mainly in the time domain, as the experimental data used in this work as well as all THz-TDS are acquired in that domain. Thus, we required the time domain version of (1):
where
is the time and
for
whilst being zero elsewhere, limiting the time range to only containing one repetition of the sampling kernel. The result of setting the free parameters
to 1 and choosing
where
can be seen in
Figure 1a. Here, the central peak is surrounded by side lobes being far from the real THz pulse representation, demonstrating the need for further work to mold our sampling kernel. This comes in the form of applying a length-
symmetric Hamming window for the free parameters:
where the cardinality
.
Figure 1b shows the result of applying this Hamming window, with the side lobes being smoothed out to give a closer appearance of a single peak. To generate a sampling kernel which represents a single THz pulse, two of these SoS peaks were combined by offsetting and scaling one of the SoS pulses, as shown in
Figure 1c. The offset between the pulses was equal to the width of the center peak at 0 amplitude and the relative scaling difference was a factor of −0.6. These were selected by comparison to a real THz reference and can be automatically tailored to a specific experimental reference as demonstrated later in this paper.
Finite streams of pulses can be used to represent THz-TDS data, finite by virtue of the data acquisition range used, and the stream of pulses representing the THz waveforms constituting the data, i.e., reflections off boundaries between materials of different refractive indices. Therefore, let us consider a
-periodic stream of
pulses with amplitudes
located at distinct times
:
Here,
is a known pulse shape, which in our case has been constructed and defined in the previous paragraphs and is in the form of (2). For this consideration, we also have the constraints of
,
,
,
and an additional constraint on
Given that we have
pulses which are each fully described by two parameters, the amplitude and time location, we have
degrees of freedom per unit time and so a finite rate of innovation,
, of:
With the aim of achieving the minimum sampling rate whilst being able to adequately reconstruct the sample, we can target
samples per
. This is the ideal minimal number of samples required using this method, which can result in sub-Nyquist rates. However, this ideal sampling rate does not account for issues such as sampling points not containing useful information, i.e., not lying on the pulse, or the presence of noise and other aberrations. By defining the periodic extension of our pulse shape
as:
We can apply Poisson’s summation formula [
12] to rewrite (6) above as:
where
represents the Fourier transform of
. By substituting this result into (4):
where we have used
to denote the bracketed terms in the preceding line, which are the Fourier coefficients of that line, it can be shown [
1] that once at least
Fourier coefficients are known, the amplitudes and time locations of the stream of pulses representing our data can be found. This enables the reconstruction, or estimation, of our data, given a suitable sampling kernel.
We now require a way to calculate what these Fourier coefficients are. We begin by considering the uniform sampling of signal
of the form seen in (4) with a sampling kernel in the form of (2), which gives a sufficient characterization of
with uniform samples
at locations
:
By substituting (8) into (9):
Here, when
is a divisor of
, as it is in our case, this reduces line 2 in the above equation to the inverse discrete-time Fourier transform of
resulting in line 3 [
1]. Utilizing Prony’s method [
13], we now introduce the annihilation filter,
, stage of the method [
14], which is by definition required to satisfy the convolution:
By satisfying this constraint, and in the case of our chosen sampling kernel with
reducing to:
We can present
in the form of its
-transform:
This is because
has
zero valued null terms at
, allowing
to be represented by the convolution of
elementary filters [
15], each of which zero out one of the sum of
exponentials in
for the convolution in (11). Now, we can construct a rectangular Toeplitz matrix,
, from
:
where
. Note that the matrix has a number of columns equal to
and a number of rows equal to
. Along with the matrix form of
we can solve equation (11), with the additional constraint for our sampling number of
, by performing the singular value decomposition [
16] of
and selecting the eigenvalues corresponding to the smallest eigenvector, giving the annihilation filter coefficients for
. This allows us to find the roots
of
, with obtaining the time locations
clearly following. With the time locations found, the last piece of information needed is the amplitudes,
which can be calculated using the Vandermonde system [
17]:
where the exponent denotes the power to which the term is taken to. As we have distinct
this system always has a solution, providing our amplitudes.
4. Experimental Results
We have shown the great potential of our FRI model in the previous section, by achieving an accurate reconstruction of the simulated signal both without and with the presence of white Gaussian noise whilst using a relatively low sampling rate. However, there are further challenges to applying this method to experimental results; primarily, the accuracy of our sampling kernel in representing the THz pulse and accounting for how this pulse would change shape during transmission and reflection through different materials.
A TeraPulse 4000 from TeraView Ltd. was used in reflection geometry with an incidence angle of 30° to measure the air-plastic reflection off a thick piece of plastic. By using this as an experimental reference, the method described in
Section 2 was employed to create a SoS which closely resembled the shape of this reference. Ideally, the SoS sampling kernel created in this way would closely resemble the pulse in our simple experimental data obtained from a thick plastic sheet. This was measured using the same experimental procedure used for the reference acquisition and with the interface causing reflections between both air and plastic. The SoS form of our reference is required in most of the FRI method, except for the final stage of calculating the amplitudes. Instead of using the Vandermonde system, a standard least-squares minimization technique between the original measured signal and measured reference, repeated at the time locations found in the previous step, was used. For the recreation of the signal, we used the time location and amplitude estimates along with the measured reference. This achieved a signal shape more similar to the original, as the measured reference more closely resembled the correct shape.
Figure 4a contains this simple experimental data. The raw measurement is shown by the green line, with the reduced sampling points used as the input for our method shown by the red points and the reconstructed output of the FRI method shown by the blue dashed line. It can be seen that there are relatively few data points describing each of the two reflection pulses, but despite this the reconstruction is a close match to the original experimental data. This indicates that the time locations and amplitudes of the reflections found by our method are accurate, as by using these in combination with our measured reference as the basis for the reconstruction we obtained a similar result to the original data. However, it can be noticed that there is a mismatch in the amplitudes of the second halves of the pulses between the raw experimental data and the reconstructed result from our model. As the amplitudes for the first halves of the pulses are a close match, this is likely the result of an imperfect estimation of the pulse shape. Compared to the standard amount of data points the THz system we were using measures, we down-sampled by a factor of over 45 to obtain the sampling points shown, which resulted in a time sampling interval of 0.33 ps. This indicates that by using this method, fewer data points can be measured and thus a much shorter data acquisition time can be achieved.
Figure 4b shows the frequency domain version of the data presented in
Figure 4a, after it has undergone a fast Fourier transform (FFT) and been normalized. The low sampling rate FFT, shown by the red line, begins to diverge from the fully sampled experimental data FFT, shown in green, from around 0.7 THz. For frequencies larger than 1.3 THz there is mostly a very significant difference in the normalized FFT amplitudes, showing the frequency domain inaccuracy of the low sampling rate data at these frequencies. This is because the sampling interval of 0.33 ps corresponds to a Nyquist sampling frequency of 1.51 THz. Crucially, the FFT of the reconstruction from the low sampling points using our FRI method, shown by the blue line, does not share this divergence and inaccuracy. This demonstrates that our method accurately reconstructs the frequency domain data of THz-TDS measurements taken at sampling rates below the Nyquist frequency, providing the benefit of quicker measurements whilst ensuring the retention of frequency domain information.
More complicated experimental situations were investigated; however, it was found that for low sampling rates the SoS sampling kernel proved to be too rigid to account for the varying pulse shape as it reflected off boundaries and propagated through different materials. Thus, an interesting avenue of future research would be to investigate more accommodating sampling kernel models for THz pulses. Despite this issue, we have shown the great potential of this method to achieve accurate experimental data processing at low sampling rates, allowing for quicker data acquisition and processing once a more versatile sampling kernel has been developed.