1. Introduction
Images are one of the necessary forms of data in the modern era. They are used to capture a variety of physical projections ranging from scientific images to personal memories. Nevertheless, the recorded images are not always in the ideal composition and often require post-processing techniques to enhance their quality. These techniques include brightness and contrast enhancement, saturation and chromaticity adjustments, and artificial high dynamic range image creation, which are now accessible through smartphone applications [
1,
2]. With the recent developments in diffractive lenses and metalenses, it is highly likely that future cameras will utilize such flat lenses for digital imaging [
3,
4]. Two main problems foreseen with the above lenses are their sensitivity to wavelength changes and technical challenges in manufacturing such lenses with a large diameter. In addition to the above, spatio-spectral aberrations are expected, as with any imaging lens. The recorded information in a linear, shift-invariant incoherent imaging system can be expressed as
, where
O is the object function, PSF is the point spread function,
N is the noise, and ‘
’ is a 2D convolutional operator. The task is to extract
O from
I as close as possible to reality. As seen from the equation, if PSF is a sharp Kronecker Delta-like function, then
I is a copy of
O with a minimum feature size given as ~
λ/NA. However, due to aberrations and limited NA, the recorded information is not an exact copy of
O but a modified version. The modified version is the blurred image. Therefore, computational post-processing is required to correct the recorded information of spatial and spectral aberrations digitally. In addition to spatio-spectral aberrations, there are other areas and situations where deblurring is essential. Like with human vision, where there is the least distance of distinct vision (LDDV), there is a distance limit beyond which it is difficult to image an object with the maximum resolution of a smartphone camera. In such cases, the recorded image appears blurred. It is necessary to develop computational imaging solutions to the above problems.
Deblurring can be achieved by numerous techniques such as Lucy-Richardson algorithm (LRA), matched filter (MF), phase-only filter (PoF), Weiner filter (WF), regularized filter algorithm (RFA), non-linear reconstruction (NLR), etc. [
5,
6,
7]. In our previous studies [
5,
6], we thoroughly investigated the performances of the above algorithms and found that NLR had a better performance than LRA, MF, PoF, WF, and RFA. In the above studies, severe distortion was obtained using thin scatterers. Recently, NLR was implemented to deblur images recorded with focusing errors when imaging with Cassegrain objective lenses at the Australian Synchrotron and the performance was not satisfactory. Therefore, in that research work, NLR was combined with LRA to create a new deblurring method called the Lucy-Richardson-Rosen algorithm (LR
2A) [
8,
9,
10].
LR
2A was then successfully implemented to deblur images recorded using a refractive lens with focusing errors with a narrow bandwidth incoherent light source (~20 nm) [
11]. In this study, LR
2A was applied to blurred images recorded with white light which is spatially as well as temporally incoherent with a broad bandwidth using latest smart phone cameras. This is a significant step as the proposed method can be used to extend the LDDV of smart phones. In all the previous studies, the deblurring methods were implemented in invasive mode. In this study, we demonstrate both invasive as well as non-invasive modes of operation. There are, of course, numerous methods of deblurring developed for different applications [
12]. Since it was already established that NLR had a better performance than the commonly used deblurring methods and LR
2A evolved from NLR and LRA, in this manuscript, the performance of LR
2A was compared only with the parent methods NLR and LRA.
Deep-learning-based pattern recognition methods gradually overtook signal-processing-based pattern recognition approaches in recent years [
5,
13]. In deep learning, in the first training step, where the network is trained using a large data set, focused images are used. The performances of deep-learning methods are affected by the degree of blur and distortion occurring during recording. It was proven that invasive indirect imaging methods have abilities to deblur images in the presence of stronger distortion in comparison to non-invasive approaches [
14]. Most of the imaging situations are linear by nature, and so, it is possible to apply invasive and non-invasive methods wherever possible. Therefore, deblurring approaches can improve the performances of deep-learning methods. If the aberrations of the imaging system are thoroughly studied and understood, it is possible to apply even non-invasive methods that can use synthetic PSFs obtained from scalar diffraction formulation for deblurring images. The concept figure is shown in
Figure 1.
The manuscript consists of eight sections. The methodology is presented in the
Section 2. The simulation results are presented in the
Section 3. The
Section 4 contains the results of the optical experiments. An introduction to pretrained deep-learning networks is given in the
Section 5. The deep-learning experiments are presented in the
Section 6. In the
Section 7, the results are discussed. The conclusion and future perspectives of the study are presented in the
Section 8.
2. Materials and Methods
Deep-learning-based image classification methods are highly sensitive to image blur. There are numerous deblurring techniques developed to process the captured distorted information into high-resolution images [
15,
16,
17]. One straightforward approach to deblurring is given as
, where ‘
’ is a 2D correlational operator and
O′ is the reconstructed image. As seen, the object information is sampled by
with some background noise. The above method of reconstruction is commonly called the MF, which generates significant background noise, and in many cases, a PoF or a WF was used to improve the signal-to-noise ratio [
18]. All the above filters work on the fundamental principle of pattern recognition, where the PSF is scanned over the object intensity, and every time a pattern matching is achieved, a peak is generated. In this way, the object information is reconstructed. The NLR method of deblurring can be expressed as
, where
α and
β were tuned between −1 and 1 until a minimum entropy is obtained [
7]. Even though NLR has a wide applicability range involving different optical modulators, the resolution is limited by the autocorrelation function
[
19]. This imposes a constraint on the nature of PSF and, therefore, on the type of aberrations that can be corrected. So, NLR is expected to work better with PSFs with intensity localizations than PSFs that are blurred versions of point images.
Another widely used deconvolution method is the iterative Lucy-Richardson algorithm (LRA) [
8,
9]. The (
n + 1)th solution is given as
, where PSF′ is the flipped version and complex conjugate of PSF. The loop begins with an initial guess which is
I, and gradually converges to the maximum likelihood solution. Unlike NLR, the resolution of LRA is not constrained by the autocorrelation function. However, LRA has only a limited application range and cannot be implemented for a wide range of optical modulators such as NLR. Recently, a new deconvolution method LR
2A was developed by replacing the correlation in LRA by NLR [
10]. The schematic of the reconstruction method is shown in
Figure 2. The algorithm begins with a convolution between the PSF and
I, with
I as the initial guess coefficient, which results in
I′. The ratio between the two matrices
I and
I′ was correlated with the PSF, and this correlation is replaced by NLR, and the resulting residue is multiplied to the first guess, and this process is continued until a maximum likelihood solution is obtained. The deblurred images obtained from NLR, LRA, and LR
2A are then fed into the pretrained deep-learning networks instead of blurred images for image classification.
3. Simulation Studies
In the present work, LR
2A was used to deblur images captured using lenses with low NA and spatial and spectral aberrations. Furthermore, we compared the results with the parent benchmarking algorithms, such as LRA and NLR, to verify their effectiveness. The simulation was carried out in MATLAB with a matrix size of 500 × 500 pixels, pixel size of 8 µm, and central wavelength of
λ = 632.8 nm. The object and image distances were set as 30 cm. A diffractive lens was designed with a focal length of
f = 15 cm for the central wavelength with the radius of the zones given as
, where
s is the order of the zone. This diffractive lens is representative of both diffractive as well as metalens that works based on the Pancharatnam Berry phase [
20]. A standard test object ‘Lena’ in greyscale was used for this study.
In the first study, the NA of the system was varied by introducing an iris in the plane of the diffractive lens and varying its’ diameter. The images of the PSF and object intensity distributions for the radius of the aperture
R = 250, 125, 50, and 25 pixels are shown in
Figure 3. Deblurring results with NLR (
α = 0.2,
β = 1) are shown in
Figure 3. The deblurring results of LRA and LR
2A are shown for the above cases, where the number of iterations
p required for LRA was at least 10 times that of LR
2A, with the last case requiring 100 for LRA and only 8 for LR
2A (
α = 0,
β = 0.9). Comparing the results of NLR, LRA, and LR
2A, it appeared that the performances of LRA and LR
2A were similar, while NLR did not deblur due to a blurred autocorrelation function.
In the next study, once again, spatial aberrations in the form of focusing errors were introduced into the system. The imaging condition (1/
u + 1/
v = 1/
f), where
u and
v are object and image distances, was disturbed by changing
u to
u + Δ
z, where Δ
z = 0, 25 mm, 50 mm, 75 mm, and 100 mm. The images of the PSF, object intensity patterns, and deblurring results with NLR, LRA, and LR
2A are shown in
Figure 4. It was evident that the results of NLR were better than LRA, and the results of LR
2A were better than both for large aberrations. However, when Δ
z = 25 mm, LRA was better than NLR, once again, due to the blurred autocorrelation function.
To understand the effects of spectral aberrations, the illumination wavelength was varied from the design wavelength as
λ = 400 nm, 500 nm, 600 nm, and 700 nm. Since the optical modulator was a diffractive lens, unlike a refractive lens, it exhibited severe chromatic aberrations. The images of the PSFs, object intensity patterns, and deblurring results with NLR, LRA, and LR
2A are shown in
Figure 5. As seen from the figures, the results obtained from LR
2A were significantly better than LRA and NLR. Another interesting observation can be made from the results. When the intensity distribution was concentrated in a small area, LRA performed better than NLR and vice versa. In all cases, the optimal value of LR
2A aligned with one of the cases of NLR or LRA. For concentrated intensity distributions, the results of LR
2A aligned towards LRA; in other cases, the results of LR
2A aligned with NLR. In all cases, LR
2A performed better than NLR and LRA. It must be noted that in the case of focusing error due to change in distance and wavelength, the deblurring with different methods improved the results. However, for the first case, when the lateral resolution was low due to low NA, the deblurring methods did not improve the results as expected.
From the results shown in
Figure 3,
Figure 4 and
Figure 5, it is seen that the performance of LR
2A was better than LRA and NLR. The number of possible solutions for LR
2A was higher than that of NLR and LRA. The control parameter in the original LRA was limited to one, which was the number of iterations
p. The number of control parameters of NLR was two—
α and
β resulting in a total of
m2 solutions, where
m is the number of states from (
αmin,
βmin) to (
αmax,
βmax). The number of control parameters in LR
2A was
m2p. Quantitative studies were carried out next for two cases that were aligned towards the solutions of NLR and LRA using structural similarity (SSIM) and mean square error (MSE).
The data from the first column of
Figure 5 were used for quantitative comparative studies. As seen from the results in
Figure 6, the maximum SSIM obtained using LRA, NLR, and LR
2A were 0.595, 0.75, and 0.86, respectively. The minimum value of the MSE for LRA, NLR, and LR
2A were 0.028, 0.01, and 0.001, respectively. The above values confirmed that LR
2A performed better than both LRA and NLR and NLR better than LRA. The regions of overlap between SSIM and MSE reassured the validity of this analysis.
4. Optical Experiments
Smart phones, even with all the advanced lens systems and software, suffer from certain limitations. One such limitation is the LDDV. In such cases, the blurred images can be deblurred using LR
2A. A preliminary invasive study was carried out on simple objects consisting of a few points using a quasi-monochromatic light source and a single refractive lens [
11]. However, the performance was not high due to the weak intensity distributions from a pinhole, scattering, and experimental errors. In this study, the method was evaluated again in both invasive as well as non-invasive mode. In invasive mode, the PSF was obtained from the recorded image of the object from isolated points and by creating a guide star in the form of a point object added to the object. In non-invasive mode, the PSF was synthesized within the computer for different spatio-spectral aberrations using Fresnel propagation as described in
Section 3. To examine the method for practical applications, we projected a test image—Lena—on a computer monitor and captured it using two smartphone cameras (Samsung Galaxy A71 with 64-megapixel (f/1.8) primary camera and Oneplus Nord 2CE with 64-megapixel (f/1.7) primary camera). The object projection was adjusted to a point where the device’s autofocus software could no longer adjust the focus. To record PSF, a small white dot was added to the image and was then extracted manually. The images were recorded ~4 cm from the screen with different point sizes, 0.3, 0.4, and 0.5 cm, respectively. They were then fed back to our algorithm and reconstructed using LR
2A (
α = 0.1,
β = 0.98, and two iterations). The recorded raw images and the reconstructed ones of the hair region of the test sample are shown in
Figure 7. Line data were extracted from the images as indicated by the lines and plotted for comparison. In all the cases, LR
2A improved the image resolution. What appeared in the recorded image as a single entity could be clearly discriminated as multiple entities using LR
2A, indicating a significant performance as seen in the plots where a single peak was replaced by multiple peaks.
For non-invasive mode, the Lena image was displayed on the monitor and recorded using Samsung Galaxy A7. The images of the fully resolved image and blurred image due to recording close to the screen are shown in
Figure 8a,b respectively. One of the reconstruction results using NLR (
α = 0,
β = 0.9), LRA (ten iterations). and LR
2A (
α = 0,
β = one and three iterations) are shown in
Figure 8c–e, respectively. The image of the synthesized PSF using scalar diffraction formulation is shown in
Figure 8f. Two areas with interesting features are magnified for all the three cases. The results indicate a better quality of reconstruction with LR
2A. The image could not be completely deblurred but can be improved without invasively recording PSF. The deblurred images were sharper and contained more information than the blurred image. The proposed method can be converted into an application for deblurring images recorded using smart phone cameras.
5. Pretrained Deep-Learning Networks
Deep-learning experiments rely on the quality of the training data sets which, in many cases, may be blurred due to different types of aberrations. In many deep-learning-based approaches, adding a diverse data sets consisting of focused and blurred images may affect the convergence of the network. In such cases, it might be useful to attach a signal-processing-based deblurring method to reduce the time of training, data sets, and improve classification. In this study, deep-learning experiments were carried out for classification of images with diverse spatio-spectral aberrations with LRA, NLR, and LR2A. A “bell pepper” image was blurred for different spatio-spectral aberrations and deblurred using the above algorithms LRA, NLR, and LR2A. The images obtained from LR2A were much clearer with sharper edges and fine details in comparison to the other two algorithms. The NLR method resulted in an image reconstruction that was slightly oversmoothed, with some loss of fine details. The LRA method produced a better result than NLR, but it was relatively slower than the other two methods, and the improvement was not as significant as it was with LR2A. Overall, LR2A was the most effective in removing the blur and preserving the fine details in the image.
Several pretrained deep-learning networks such as squeezenet, inceptionv3, densenet201, mobilenetv2, resnet18, resnet50, resnet101, xception, inceptionresnetv2, shufflenet, nasnetmobile, nasnetlarge, darknet19, darknet53, efficientnetb0, alexnet, vgg16, vgg19, and GoogLeNet were tested to validate the efficiency of LR
2A [
21]. Pretrained networks are highly complex neural networks trained on enormous datasets of images, allowing them to recognize a wide range of objects and features in images. MATLAB offers a variety of these models that can be used for image classification, each with its own unique characteristics and strengths. One such model is Darknet53, a neural network architecture used for object detection. Comprised of 53 layers, this network is known for its high accuracy in detecting objects. It first came to prominence in the revolutionary YOLOv3 object detection algorithm. Another powerful model is EfficientNetB0, a family of neural networks specifically designed for performance and efficiency. This model uses a unique scaling approach that balances model depth, width, and resolution, producing significant results in image classification tasks.
InceptionResNetV2 is another noteworthy model, combining the Inception and ResNet modules to create a deep neural network that is both accurate and efficient. It first gained widespread recognition in the ImageNet Large Scale Visual Recognition Challenge of 2016, where it achieved state-of-the-art performance in the classification task. NASNetLarge and NASNetMobile are additional models that were discovered using neural architecture search (NAS). These models are highly adaptable, with the flexibility to be customized for a wide range of image classification tasks. Finally, ResNet101 and ResNet50 are neural network architectures that belong to the ResNet family of models. With 50 and 101 layers, respectively, these models were widely used in image classification tasks and are renowned for achieving state-of-the-art performance on many benchmark datasets. Overall, the diverse range of pretrained models available in MATLAB provides a wealth of possibilities for performing complex image classification tasks. Detailed analysis of different networks and computational algorithms such as NLR, LRA, and LR
2A are given in the
Supplementary Materials. GoogLeNet is a convolutional neural network developed by researchers at Google, which is one of the widely used methods for image classification [
22].
6. Deep-Learning Experiments
Feeding a focused image is as important as feeding a correct image into the deep learning-network for optimal performance. This is because a blur can cause significant distortion to an image that may be mistaken for another image. A trained Google net in MATLAB accepts only color images (RGB) with a size of 224 × 224 for classification. In this experiment, a color test image, “bell pepper”, was used. The red, green, and blue channels were extracted from the color image and chromatic aberrations of a diffractive lens was applied for a central wavelength
λc = 550 nm, and the resulting blurred images were fused back into a color image. A significantly high aberration was applied to test the limits of the deblurring algorithm. The blurred color image can be written as
, where
R,
G,
B are the red, green, and blue channels. The blurred image was then analyzed using the pretrained GoogLeNet. The schematic of the process of applying blur to different color channels and obtaining blurred color image and the analysis results from GoogLeNet are shown in
Figure 9. GoogLeNet classified the blurred “bell pepper” image as jellyfish with more than 70% probability and spotlight, scuba diver, projector, and traffic light with reduced probabilities.
The deblurring methods NLR, LRA, LR
2A were applied to different color channels and after deblurring, the channels were fused to obtain the deblurred color image. The above process is shown in
Figure 10. As seen, NLR and LRA reduced the blur to some extent but the deblurred image was not classified as “bell pepper” even within 3% probability. However, the results obtained for LR
2A had “bell pepper” as one of the possibilities, which shows that LR
2A is a better candidate than LRA and NLR for improving the classification probabilities of pre-trained deep-learning networks when the images are blurred.
Since it is established from the results shown in
Figure 10 that LR
2A performed better than NLR and LRA, in the next experiment, with different types of aberrations, only classification results for blurred image (BI) were compared with those obtained for LR
2A. Blurred and deblurred images obtained from LR
2A of the test object were loaded, and its classification was carried out. The results obtained for a typical case of blur and the deblurred images are shown in
Figure 11a,b, respectively. It can be seen from
Figure 11a that GoogLeNet could not identify “bell pepper” and classified it as a spotlight. In fact, the label ‘bell pepper’ did not appear in the top 25 classification labels (25th with the probability of 0.3%), whereas for the LR
2A reconstructed, the classification results were improved by multifold, and the classification probability was about 36%. Few other blurring cases were also considered (
Figure 8c A (
R = 100 pixels), B (
R = 200 pixels), C (Δ
z = 0.37 mm), D (Δ
z = 0.38 mm), and E (Δ
z = 0.42 mm), F (Δ
λ = 32.8 nm), G (Δ
λ = 132.8 nm). In all the cases, image classification probability was significantly improved for LR
2A.
7. Discussion
The study presented here linked imaging technology and deep learning in a new light. In most of the previous studies [
23,
24,
25], deep learning was presented as a tool to improve imaging. In this study, for the first time, we showed the possibility to improve the classification performance of deep-learning networks with optical and signal-processing tools. The data used for training deep-learning networks and the original data were often assumed to be without aberrations, which is not the case in reality. From this study, it was quite clear that even minor spatio-spectral aberrations can affect the performances of deep-learning networks significantly. By attaching a robust deblurring method to a deep-learning network, it is possible to improve the performance significantly in the presence of aberrations.
The above idea is different from developing deep-learning methods to deblur blurred images. Even in such cases, if the input dataset consists of both blurred and focused images, fitting over diverse datasets with a deep-learning network is a challenging task. In recent years, physics-informed deep-learning methods were gaining attention [
26]. The proposed method can be applied in a similar fashion as a blurred image can be detected by observing the strengths of high spatial frequencies and the probability of classification [
27,
28].
LR
2A is a recently developed deblurring algorithm which was demonstrated so far only with spatially incoherent but temporally coherent illumination and only invasively [
10,
11]. In this study, for the first time, LR
2A was implemented for white light illumination and using latest smart phones in both invasive as well as non-invasive modes. In the non-invasive approach demonstrated, the synthetic PSFs were obtained by scalar diffraction formulation. The method was implemented on datasets with aberrations for many deep-learning networks. The performance of deblurring assisted GoogLeNet for datasets with spatio-spectral aberrations is presented. The performances of about 19 pre-trained deep-learning networks were evaluated for NLR, LRA, and LR
2A and the detailed results are presented in the
Supplementary Materials. In all the cases, it was evident that LR
2A significantly improved classification performance compared to NLR and LRA. It is true that there are numerous deblurring methods developed for various applications, and comparing all the existing methods is not realistic. Since, it was well-established that NLR performed better than commonly used deblurring methods such as MF, PoF, WF, and RFA, in this study, the comparison was made only with the parent methods, namely LRA and NLR.
8. Summary and Conclusions
LR
2A is a recently developed computational reconstruction method that integrates two well-known algorithms, namely LRA and NLR [
10]. In this study, LR
2A was implemented for the deblurring of images simulated with spatial and spectral aberrations and with a limited numerical aperture. In every case, LR
2A was found to perform better than both LRA and NLR. In the cases where the energy was concentrated in a smaller area, LRA performed better than NLR and vice versa. However, in all cases, LR
2A aligned towards one of NLR or LRA and was better than both. The convergence rate of LR
2A was also at least an order better than LRA, and in some cases, the ratio of the number of iterations between LRA and LR
2A was >50 times, which is significant. It must be noted that LR
2A uses the same approach as LRA, which is estimation of the maximum likelihood solution. However, the estimation speed is faster than LRA due to the replacement of MF by NLR in LRA and offers a better estimation. In some cases of reconstruction, the signal strength appears weaker than the original signal due to the tuning of
α and
β to non-unity values. We believe that these preliminary results are promising, and LR
2A can be an attractive tool for pattern recognition applications and image deblurring in incoherent linear shift-invariant imaging systems. There are certain challenges in implementing LR
2A (and LRA) as supposed to NLR. In NLR, the optimal values of
α and
β can be blindly obtained at the minima of entropy, which is not the case in LR
2A. Additionally, NLR is quite stable with non-changing values of
α and
β unless a significant change was made to the experiment. This was not the case with LR
2A, as it was highly sensitive to even minor changes in PSF, object, and experimental conditions. Dedicated research is needed in the future to develop a fast method to find the optimal values of
α,
β, and number of iterations in the case of LR
2A. In the current study, the values of
α and
β were obtained intuitively.
In this study, LR
2A was demonstrated for the first time with spatially and temporally incoherent illumination, making it applicable for wider applications. Optical experiments were carried out using the camera of smartphones, and the recorded images were significantly enhanced with LR
2A. This approach enables imaging beyond the LDDV of smart phone cameras. Deep-learning-based image classification is highly sensitive to the quality of images, as even a minor blur can affect the classification probability significantly. In this study, the invasive and non-invasive LR
2A-based deblurring was shown to improve the classification probability. A significant improvement was also observed in the case of deep-learning-based image classification. The current research outcome also indicates the need for an integrated approach involving signal processing, optical methods, and deep learning to achieve optimal performance [
29].