1. Introduction
The continuous growth of thermal remote sensing applications, for example to investigate energy leak detection in buildings [
1] or to support precision agriculture [
2], has been determining an increasing demand for better spatial resolution, in order to analyze finer details of surface temperature patterns. However, the limited spatial resolution of the sensors operating in the thermal band (wavelengths in the range of 8–14 microns) [
3] makes the use of super-resolution (SR) methods particularly appealing to overcome hardware technological limitations. SR is the process of generating a higher resolution (HR) image starting from one or many lower resolution (LR) frames. In this context, the present paper aims to contribute to the development of a reliable and efficient SR approach, suitable for application to thermal remote sensing.
To be used with thermal imagery, SR algorithms should not only enhance the visual perception and readability of finer details, but also improve the fidelity of the radiometric values associated with pixels. Indeed, the spatial resolution affects temperature readings, especially for objects of a comparable or smaller size than the pixel footprint. This feature is crucial for thermal remote sensing applications, where the measurement of the surface temperature of hot spots and cold spots, as well as average values on homogeneous surfaces is often the primary goal. Hence, to be used for remote sensing purposes, the resulting SR images should be radiometrically accurate, more than visually appealing.
Some of the newest terrestrial thermal cameras provide built-in SR capabilities. For example, FLIR patented the UltraMax technology [
4], which is based on the rapid acquisition of 16 thermal frames in less than one second. The magnification factor is, however, limited to two, and there are some operational limitations (not too much movement from the user or the target and a minimum contrast in the scene). The Infratech company, instead, implemented a fast-rotating MicroScan™ wheel [
5], which enables the acquisition of four exposures per wheel resolution, each one offset laterally by half a pixel.
In the literature, many papers about SR on thermal images have been published. The approaches can be classified according to different criteria, such as the domain (spatial or spectral) and the algorithm (interpolation, regularized reconstruction, and learning). Moreover, a significant distinction consists in the number of LR input images: one for single image SR (SISR) or many for multiple image SR (MISR).
Some experiments are based on the fusion between thermal and visible images, both for satellite sensors (see [
6] among the most recent ones) and terrestrial (e.g., [
7,
8]). These methods have the disadvantage of involving more sensors and mixing radiometric information coming from different bands.
There are also methods entirely based on thermal imagery. They mostly are regularized reconstruction methods that minimize a cost function that is a weighted trade-off between a fit-to-data term and a regularization term. The fit-to-data term is related to the observation model, e.g., the features of the thermal camera or the noise distribution that affects the LR thermal images, while the regularization term represents prior knowledge on the HR image. The choice of the weight, also known as the regularization parameter, is crucial for a finely detailed HR reconstruction. In [
9], and more recently in [
10], the HR thermal image was obtained by a maximum a posteriori (MAP) estimation where the prior knowledge was represented by an L2 norm regularizer. In [
11], the authors used two regularization terms constituted by the Huber and a gradient based function with the aim of removing outliers with respect to a given threshold and preserving high-frequency structures. In [
12,
13], based on the assumption that there are a large number of similar patches in infrared images, non-local means filters (NLMF), steering kernel regression (SKR), and spatial cross-correlation (SCC) were used as priors to estimate the noise-free HR thermal image with fine texture details. A different approach was used in the most recent paper on infrared SR [
14], which developed an automatic interpolation scheme for MISR. In particular, the method relied on the correlation between registered LR image patches to obtain the super-resolved image.
Although [
9,
10,
11,
12,
13] provided good performances, they were not fully automatic, and a parameter setup was needed. Furthermore, the algorithms proposed in the cited papers usually applied only to single or to multiple image SR alternatively, with a magnification factor at most three. On the contrary, for applications in the geomatics domain, fully automatic algorithms that are flexibly adaptable either to single or multi-image processing are desirable. Indeed, some thermal imaging applications do not allow the acquisition of more than a single LR image for a particular scene due, for instance, to the quick relative motion between the camera and the object. This occurs in the case, for example, of aerial thermal surveys [
15] or in acquisitions subject to a quick change of surface temperature, when the observed surface is not in thermal equilibrium with the environment. On the contrary, in other cases, such as static ground inspections of thermally stable objects (e.g., [
16]), more LR frames of the same scene can be acquired and processed.
Taking into account the previous considerations, we propose a super-resolution algorithm that belongs to the class of regularized reconstruction methods. The contributions of this paper are summarized in the following four points. Unlike many other existing methods [
9,
10,
11,
12,
13], the algorithm: (1) is fully automatic, e.g., all the parameters are adaptively chosen; (2) it is suited for both SISR and MISR of thermal images; and (3) it allows a magnification factor greater than three, still preserving a limited computational cost. For these features, it is possibly implementable in the thermal camera firmware. (4) We introduce a new radiometric quality analysis for a specific region of interest of thermal images, such as cold or hot spikes. The results obtained by comparing the proposed algorithm with two existing automatic SR methods based on different approaches show better radiometric quality with a contained computational cost.
In
Section 2, we report some recent developments of super-resolution for natural images and, in
Section 3, we describe our automatic SR algorithm. To evaluate its performance, we carry out some experiments on thermal images acquired in different conditions, i.e., aerial and terrestrial acquisitions (
Section 4). In
Section 5, we evaluate the quality of the proposed algorithm, by performing a detailed radiometric analysis on the temperature values both for the peaks and the plateaus in the computed SR images. Since we are interested only in fully automatic methods for both SISR and MISR, we compare our method with two existing automatic algorithms: one designed for thermal images based on an interpolation approach [
14] and one of the most efficient algorithms based on neural networks [
17]. The results confirm the outstanding performance on thermal images of the proposed SR method.
2. SR Algorithms for Natural Images
In the last few years, the SR of natural images has been widely investigated. For the interested reader, we briefly review the current state-of-art, even if these algorithms have not been tested on thermal images. Finally, we present our new proposal. SR is known to be an ill-posed problem since different HR images can generate identical LR images. Moreover, HR and LR images are related by a sub-sampling function and a point spread function (PSF), which are ill-posed operators. Therefore, the regularized reconstruction methods can be suitably applied due to their good performances and flexibility. In practice, the solution is obtained by minimizing a weighted trade-off between a fidelity term and a regularization term. The weight used is commonly a positive scalar, which is named the regularization parameter. As clearly underlined in [
18], the reconstruction capability of these methods hinges on the selection of this hyper-parameter.
Since the first statistical approach for SR in [
19], other papers, such as [
18,
20,
21] and the references therein, used the regularized reconstruction approach.
All these approaches have some drawbacks: They require the tuning of many parameters during the optimization process, leading to time-consuming algorithms, which are not fully automatic. Hence, they are not suited for thermal super-resolution. To our best knowledge, the only fully automatic SR method was [
22]; however, the proposed strategy had a high computational cost, and it was suitable only for small images and tiny magnification factors. In the last few decades, a large class of learning based methods has been studied. All the methods belonging to this class rely on high representative models, which are capable of learning image priors from a set of example images. These methods are based on sparse coefficients representation [
23,
24] and more recently on deep neural networks (DNNs) [
17]. Despite the very good results, the application of deep learning procedures depends on the fixed set of example images; the training phase is time-consuming; and moreover, a trained model cannot manage different magnification factors.
3. The Adaptive Super-Resolution Algorithm
In our approach, the data fidelity term consists of the least-squares distance, and the regularization term is given by the discrete total variation (TV) operator [
25]. We set the L2 norm as the fit-to-data term since our thermal camera provides LR data with a Gaussian noise component. The TV function is widely used in image processing applications for its edge enhancing and noise removal properties. It is particularly suited for blocky images, which are common in many thermal remote sensing applications.
The proposed SR algorithm, named adaptive super-resolution (ASR), computes a sequence of approximate HR images
, of size
, where
s is the predetermined magnification factor. Let
be a set of
r low-resolution observed images of size
representing the same scene and
be the same images lexicographically reordered in vectors. In each iteration
k, we compute a new approximation
of the final high-resolution image as the solution of the following minimization problem:
where
H and
S represent, respectively, the blurring and downsampling operators (we assumed that the same PSF and sub-sampling operators apply to each image in the acquisition set) and
is the regularization parameter. The blurring matrix
H comes from the discretization of a convolution product with a 2D kernel
h. As the most general, we chose a Gaussian kernel:
where
and
are the mean and the standard deviation of the Gaussian kernel, respectively. The downsampling matrix
S reduces the dimension of the matrix on which it operates by removing rows and columns. The discrete TV function has the following expression:
where
is the forward difference approximation of the gradient of
x at the pixel
i, where
.
The choice of the regularization parameter
is crucial for the quality of the output image. Usually, the value of
is constant throughout the iterations and heuristically chosen, as in [
18,
20,
21,
26,
27]. We propose to update
at each algorithm iteration with a low-cost adaptive rule leading to a fully automatic method that does not require any information about the noise on the data.
We briefly describe how to solve the minimization problems (
1) and how to choose a suitable sequence
.
The starting guess
of the iterative process is essential for the convergence speed of the method. Many existing variational methods use bicubic or bilinear interpolated HR images as a starting guess, but the quality of the initial image is often not satisfactory. We propose an initial artifact free HR image obtained by a smoothing filter:
where
and
represent the transpose
H and
S matrices. We observe that, since the acquired images are not constant, we avoid the trivial case of
constant; hence,
is different from zero. The vector
, solving the minimization problem (
1), is computed by a sequence of Fast Iterative Shrinkage Thresholding Algorithm steps (FISTA) [
28] adapted to the SR problem. The inner denoising subproblems are solved by the accelerated forward backwards algorithm (FBA); for more details, see [
29]. To update the regularization parameter at each iteration, we create a decreasing sequence
by generalizing the formula proposed in [
30]. This property ensures that
converges to the desired solution as demonstrated in [
31]. Let us define the function
:
At each iteration
k, we choose the regularization parameter
as follows:
where
p is a given positive integer value. Therefore, we put the value of
p, used in the regularization parameter formula (
6), alongside the name of the adaptive super-resolution (ASR) algorithm. In the following, the method name is ASRp.
It was proven that the sequence
computed by (
6) is strictly decreasing [
30]. Concerning the starting value of
, we highlight that the following choice provided excellent results in our tests:
We note that at the starting iteration, due to the choice of
in (
7), we have that:
The algorithm iterations are stopped on the basis of the following criterion:
where
represents the relative change of the computed solutions at two subsequent iterations. In our application, the value
always guarantees accurate results.
Figure 1 schematically illustrates the iterative process. The input values
and
are computed in advance by Equations (
7) and (
4), respectively. Hence, the algorithm is parameter-free. As a stop condition, we use (
8).
5. Results
In this section, we present some numerical tests to investigate the proposed ASRp method and demonstrate its effectiveness. All the experiments were performed on a PC Intel(R) Core(TM) i5-6200U CPU, 2.40 GHz with 8.00 Gb RAM using MATLAB R2018b. We first present the results on simulations. In the case of a single image, we considered the acquired images I1 and I2 as the ground truth, and we performed a down-sampling of a factor of four to obtain synthetic low-resolution data. In the case of multiple images, we considered the synthetic datasets created from the I1 and I2 test images as described in
Section 4.3, and we down-sampled all the
images by a factor of four, obtaining a set of 32 LR images of size
. In this case, before applying the ASRp method, we needed to register the images of the dataset, since the method assumed that all the images were acquired from the same point of view. We adopted the registration procedure described in [
14], which relies on the mutual information technique [
34,
35]. Given a target and a reference frame having the same dimensions, the output of the registration process was a 3 × 3 rigid transformation matrix that permitted an accurate overlapping between the two images. The parameters contained in that matrix were the rotation angle, expressed in terms of direction cosines, and the two translation values along the orthogonal axes. As concerns the blurring kernel
H described in (
2), based on the technical information of the acquisition camera, we fixed
and
. To measure the performance of the algorithms, we used the mean squared error (MSE), the peak signal-to-noise ratio (PSNR), and the mean structural similarity (SSIM) indices. The MSE was the averaged square distance between the original HR image
Y and the reconstructed HR image
X:
The best quality was obtained by an MSE value close to zero. The PSNR, usually expressed in terms of the logarithmic decibel scale (dB), is defined as follows:
where, in this case, the highest values represent the best image quality. Typical values for good quality images are between 30 and 50 dB. The MSE and PSNR are the most simple and widely used error measurements [
36]. However, these quantities are sample dependent and not well suited for predicting the perceived quality of digital images. The SSIM, designed to improve MSE and PSNR methods, has the following expression:
where
X and
Y are the reconstructed HR and the original HR images of size
, respectively,
,
are the mean and standard deviation of
X and
Y, and
are constants. The SSIM parameter lies in the interval
, and values near one indicate a good visual image quality [
37].
5.1. Experiments on Simulated Data
We first analyzed the performance of the ASRp method on simulated data by varying the value of the parameter
p involved in the update of the regularization parameter
as in (
6) and by evaluating the proposed method with the performance parameters defined in (
10) and (
11).
Concerning the choice of
p, in
Figure 3, we plot the results obtained with
,
, and
for the SISR of I1 and I2 images. In all the graphics, the blue dotted line, the continuous red line, and the dashed green line represent the ASR1, ASR2, and ASR3 algorithms, respectively. We report the behavior of the objective function
, defined in Equation (
5), in Subfigures (a) and (c) and the trend of the PSNR parameter in Subfigures (b) and (d), over the increasing iteration number for the images I1 and I2. Since in all the SISR and MISR tests, we observed the same behavior when comparing those algorithms, we considered in the following only our ASR2 (
).
Regarding the SISR problem, we named our method ASR2_s when a single LR image was available. We compared the proposed ASR2_s method with two widely used SR methods based on interpolation strategies [
38], the nearest neighbor (NN), and the bicubic interpolation and with an algorithm belonging to the class of learning based methods, namely the enhanced deep super-resolution (EDSR) neural network [
17]. The first two algorithms had a low computational cost despite a not so high visual image quality. The EDSR was found as one of the best performing neural networks for super-resolution. It won the New Trends in Image Restoration and Enhancement (NTIRE)2017 Super-Resolution Challenge, and it is currently one of the state-of-the-art methods on benchmark datasets. We took from [
39] the pre-trained model with 16 residual blocks, 64 filters, and 1.5 M parameters.
Table 1 contains the values of the quality parameters and of the computational times obtained on both the test images I1 and I2.
Now, we analyze the performance of the ASR2 for MISR, that is more than one input image was available (we name our method as ASR2_m in the following).
Figure 4a plots the increasing trends of the PSNR parameter of ASR2_m by increasing the number of input images in the range
. We observed that about 12 images were sufficient to get the best result.
Figure 4b shows the computation time in seconds of ASR2_m varying the number of LR input images.
We now carry out comparisons with the results obtained by the interpolation method proposed in [
14] (RISR in the following), which is one of the most recent algorithms for thermal image SR, and it is specific for the multiple image case. In
Table 2, we report the PSNR and SSIM parameters, together with the computational time in seconds for both I1 and I2.
A detailed quantitative evaluation of the HR images obtained with the SR algorithms (both in the case of single and multiple SR) was performed by identifying in I1 and I2 some regions of interest (ROI). The ROIs chosen for our analysis are highlighted by the yellow, red, and green boxes in
Figure 2a,b. They contain different interesting features for the quantitative analysis.
Moreover, to better evaluate the HR images produced by the NN, ASR2_s, ASR2_m, and EDSR methods, we considered portions of a single row and column, highlighted with dotted lines in
Figure 2a,b (inside the ROIs). In
Figure 5a–d, we plot the reconstructions together with the ground truth for each considered segment.
In
Figure 6, we compare the reconstructions of details in the I1 image, applying a magnification factor equal to four (all the crops are represented in the same color map).
Regarding the quantitative measures of the observed methods, we distinguished between the ROIs R1 and R2, containing hot-cold spots or sharp edges, and R3 representing flat surfaces. In R1 and R2, we computed the following absolute errors:
where
and
are the maximum and minimum temperature values, respectively (we remind that
Y is the ground truth image and
X is the HR computed image). Conversely, we evaluated the quality of the flat region R3 through the standard deviation
where
are the values of the pixels in R3 and
is their mean.
The values for each ROI are reported in
Table 3 and
Table 4, highlighting the best results according to the following observations.
Column R1 in
Table 3 is relative to a single cold spike (
Figure 5a); hence, a small value of
represents the ability of the method to capture such a feature. The minimum value
, represented in boldface, was obtained by the ASR2_s. On the contrary, in
Table 4, the column R1 is relative to several hot spikes, represented in
Figure 5c. In this case, a small of value
represented the capacity of the method to reproduce such a situation, and the best result, highlighted in column
, was obtained by the ASR2_s. Column R2 of both tables represents an edge, and the average value, between
and
, evaluates its quality. In both cases, ASR2_m reached the best value.
5.2. Experiment with Real Data
The last experiment considered the set of thermal images of size
(
Section 4.3) representing the same scene of I2 described in
Section 4.2. Those images were given as registered LR input to the super-resolution methods.
Two portions of the HR images, obtained by the methods NN, bicubic, EDSR, RISR, ASR2_s, and ASR2_m, are shown in
Figure 7 and
Figure 8. The images reported in
Figure 7 and
Figure 8 show that the details were best reproduced by ASR2_m (
Figure 7 and
Figure 8f), while the NN, bicubic, and EDSR images appeared out of focus. Moreover, in both cases, we observed that the RISR method presented the same error pattern (
Figure 7 and
Figure 8d).
6. Discussion
As stated in
Section 1, all the existing thermal SR methods deal with a magnification factor at most equal to three. Since we propose a more forceful approach, all the results presented in
Section 5 are relative to super-resolution with a magnification factor equal to four.
In
Figure 3, we show how the choice of the positive integer
p of the method, introduced in (
6), affected the quality of the HR reconstruction, underlining once again the importance of a good guess for the regularization parameter. In all the tests carried out, we observed the same steeper decreasing behavior of the objective function, together with the faster increase of the PSNR when we used
instead of
. Moreover, we found that the value
did not improve the PSNR. Hence, (
6) represents a substantial improvement of the update rule introduced in [
30].
In
Table 1, we compare the SISR results on I1 and I2 with other SR methods based on different approaches. Among different existing methods for SISR, we chose those for which no parameter setup was needed, i.e., the bicubic and the nearest neighbor(NN) algorithms, both based on an interpolation approach, and the enhanced deep super resolution (EDSR) for the learning approach. The ASR2_s method was confirmed to be competitive in terms of SSIM and PSNR with the EDSR algorithm, while its execution time was higher. We observed, indeed, that this algorithm required a couple of days of training on a fixed set of examples, and the model trained fit only for a specific magnification factor.
We now examine the performance of the ASR2 method when many low-resolution images were available, also comparing it with other methods in terms of the PSNR and SSIM values. At first, we analyzed the sensitivity of the method concerning the number
r of input images. The PSNR value significantly increased when using less than about 12–13 images for both I1 and I2, and it remained almost constant for a more significant number of images (see
Figure 4a). Regarding the times,
Figure 4b shows that the workload was almost constant when considering from two to 32 starting images. Indeed, the number
r of images affected only the fit-to-data term of the objective function (
5), while the problem size depended on the dimension of the HR image, which was constant for each value of
r.
Concerning the comparison with the RISR method, we can see in
Table 2 that the SSIM parameter was almost the same for all the considered methods, while the PSNR value was higher for the ASR2_m algorithm. Moreover, the last column in
Table 2 points out that ASR2_m was about ten times faster than RISR. Comparing the quality measures in
Table 1 and
Table 2, we could conclude that a more significant number of input images produced more accurate results. Remarkably, our ASR2_m applied to a set of 32 LR images outperformed, in terms of PSNR value, the EDSR algorithm (see
Table 1).
By analyzing the workload in the case of single and multiple input images, we point out that the presence of multiple images positively affected the convergence speed, causing a decrease in the computational time, as shown by comparing the last column of
Table 1 and
Table 2. The lower computation time of ASR2_m was due to a smaller number of iterations performed by the method, when compared to ASR2_s for the same given tolerance
introduced in
Section 2.
To better visualize the details of the reconstructed HR images, we identify in
Section 5 particular regions of interest (ROIs) where we compared the results of the more efficient methods. We underline that the NN high-resolution image had been chosen because, reproducing the original values of the LR image, it allowed us to evaluate how much each method improved the initial data.
Figure 5a shows that the ASR2_s method worked better than the others on the cold spike in R1 ROI of I1 (
Figure 2a), since it better approximated the extreme value.
Figure 5b can be used to represent the behavior of the methods in reconstructing the edge of I1 in R2 ROI: we can see that the ASR2_m line was the closest to the ground truth.
As concerns the I2 test image, in
Figure 2b, we selected a row and a column inside the R1 and R2 ROIs (represented again by the dotted lines), and we plotted the relative SR reconstructions in
Figure 5c,d, respectively. Plot (c) presents temperature oscillations in a range of about four degrees, and we can observe that ASR2_s best reproduced the correct temperature values. Moreover, the ASR2_m method correctly approximated the edge in the plot (d).
Figure 6 confirms the previous considerations showing that the ASR2 methods (both in the case of SISR and MISR), which used the TV regularization, produced images with sharper contours; in particular, the black circle corresponding to the cold spike inside the R1 ROI of I1 was better detected by the ASR2_s algorithm (
Figure 6e).
In the case of thermal images, the PSNR and SSIM parameters were not sufficient to evaluate the HR reconstruction. A radiometric analysis was remarkable and required. To this aim, we used the results in
Table 3 and
Table 4. In these tables, observing column R1, we noticed that ASR2_s outperformed the other methods by more than
on both the cold spike (for I1) and hot spike (for I2). Therefore, it was the most advisable method for processing these image features. For the ROI R2, the best results were represented by the minimum average value between
and
, highlighted in boldface in column R2. ASR2_m outperformed the other methods of more than
and
for I1 and I2, respectively. Finally, the R3 columns of
Table 3 and
Table 4 are relative to homogeneous surfaces. In this case, the best results were obtained by the minimum value of the standard deviation, which was reached by ASR2_m for I2 and by the RISR method for I1 (we observe that in
Table 4, all the methods showed a very similar performance). We point out that ASR2_m removed at least
of the HR reference noise. Hence, we could conclude that the ASR2_s method produced the smallest errors on both the cold spikes (for I1) and hot spikes (for I2), resulting in being the more advisable method for the processing of these image features. The larger errors of the ASR2_m method in isolated hot and cold spikes were probably caused by two concurrent factors. On the one hand, small registration errors produced larger values of the regularization parameter, compared to ASR2_s, thus suppressing small outliers. On the other hand, we observed that the low sampling process acted on the temperature peaks by reducing their absolute value, due to the smoothing downsampling techniques applied. Moreover, the ASR2_m method was recommended for images with flat zones, sharp edges, and texture. Its efficiency was affected by the estimation rule of the regularization parameter and by the number of LR input images.
Concerning the experiments with real data, we observed that in the ASR2_s images (
Figure 7 and
Figure 8c), the edges were well enhanced, but the noise was still visible in the background, while the ASR2_m method produced very well-defined HR image details compared to the other methods. The NN, bicubic, and EDSR methods showed many artifacts when they reconstructed small details such as the roof tiles (
Figure 7a–c) and the round arches (
Figure 8a–c). The RISR method returned a high quality HR image, but with an artificial texture pattern (
Figure 7 and
Figure 8d). We point out that all the crops are represented with the same color map.
From all the experiments, we could conclude that the ASR2 methods always yielded good quality SR thermal images. In particular, where the image presented isolated temperature peaks, ASR2_s could reproduce them more accurately. On the contrary, the noisy background was better smoothed by ASR2_m, which returned high-quality images with fine details and sharp edges.
7. Conclusions
Thermal remote sensing imagery is characterized by low spatial resolution. For this reason, this paper presented a flexible method for SISR and MISR, allowing for a magnification factor larger than three without requiring any parameter setup. The proposed ASR_2 method was based on a TV-regularized approach and included an adaptive and low-cost computation of the regularization parameter, thus resulting in being fully automatic and computationally efficient.
A detailed analysis was conducted, comparing the proposed algorithm with the EDSR, one of the best performing deep neural networks for SR, and RISR, one of the most recent algorithms designed for thermal image SR. The experimental results proved the strength of the proposed approach in terms of visual evaluation, e.g., PSNR and SSIM. A careful radiometric analysis confirmed the reliable reconstructions of sharp details and isolated hot or cold spots, which are essential in thermal imagery and many remote sensing applications. This proposal gained at least accuracy in reconstructing temperature peaks compared to the EDSR and RISR, and it removed at least of the noise.
The flexibility of the approach made it particularly suitable for both aerial and terrestrial thermal remote sensing, especially for applications involving the detection and delineation of hot spots or temperature anomalies at a finer spatial scale.